IBM is testing ways to improve natural language processing

IBM researchers describe in four different papers ways to improve natural language processing. These include new semantic parsing techniques, a method for integrating incomplete knowledge bases with corpora and a tool for recruiting experts to refine interpretable, rules-based systems.

Salim Roukos, senior manager at IBM Reserach, states that the natural language processing systems of large companies often face challenges due to multiple factors. This includes the use of heterogeneous silos of information, incomplete data and the training of accurate models with small amounts of data, writes Venturebeat.

We explore multiple themes to address these challenges and improve natural language processing for enterprise purposes.

AMR

The first study focused on an abstract meaning representation (AMR). This is a data structure that allows similar sentences to have the same representation.

In the research, the scientists used reinforcement learning, which is an artificial intelligence training technique (AI) that uses rewards to guide software policy towards certain goals.

In this way, the authors of the study were able to bring the semantic accuracy of a target graph to 75.5 percent. Previously, the maximum was 74.4 percent.

Multiple knowledge bases

Another IBM team wrote in a paper about an approach for queries, where semantic parsing is unified across multiple knowledge bases. The technique uses the structural similarity between query programs to search through different knowledge bases.

That work is in line with that of yet another team. In it, IBM scientists studied incomplete knowledge bases and how they can be combined with a body of text.

This is an approach that, in their view, can lead to better answers to questions that have not been fully addressed in their knowledge bases or individual documents.

HEIDL

In the last paper the researchers describe a tool called Human-in-the-loop linguistic Expressions with Deep Learning (HEIDL). This tool sorts machine-generated expressions by precision and recall.

In one of the experiments, IBM lawyers annotated in 20,000 sentences of nearly 150 contracts sentences related to important clauses, such as termination, communication and payments. HEIDL then analysed them to provide high-level insights.

A team of data scientists used this to identify an average of seven rules that automatically labeled the contracts in about half an hour. According to the scientists it would have taken a week or more to do this by hand.

This news article was automatically translated from Dutch to give Techzine.eu a head start. All news articles after September 1, 2019 are written in native English and NOT translated. All our background stories are written in native English as well. For more information read our launch article.

ChatGPT Data Collective gives users control over their data

Critics argue that AI companies exploit user data without permission or compensation. The new ChatGPT Data Co...

Berry Zwets July 2, 2025

Top story

Building on 50 years analytics, SAS charts the future of AI

With close to fifty years of experience, SAS has guided organizations through the major shifts in analytics. ...

Berry Zwets 3 days ago

Top story

Inside TCS’ digital race behind Formula E

The world of Formula E combines technology and speed with sustainability. It's a blend that Tata Consultancy ...

Erik van Klinken June 27, 2025

Mistral launches Voxtral: open-source speech recognition for businesses

Mistral is launching its new Voxtral speech models, designed to serve as an alternative to closed APIs offere...

Berry Zwets July 15, 2025

Expert Talks

Tech calendar

IBM is testing ways to improve natural language processing

AMR

Multiple knowledge bases

HEIDL

Stay tuned, subscribe!

ASML chain moves en masse to Southeast Asia: a sign of things to come?

Broadcom launches Tomahawk Ultra with 250ns network latency

Chris Wright: AI needs model, accelerator, and cloud flexibility

Managing the AI chaos with ServiceNow's AI Control Tower

HPE takes a full-stack approach to the AI Factory

The unique IT challenges of Carnival Cruise Line's "floating cities"

Global cancer research needs a data platform that can support it

How AI and automation are redefining ROI in the enterprise

Enhancing video encoding: The AV1 support in the new ARTPEC-9 System-on-Chip

How organisations can remain compliant while building resiliency during the AI era

GITEX DIGI_HEALTH 5.0 - Thailand

IT Arena

Innovation Week 2025

Luxembourg Venture Days

Appdevcon

Webdevcon

Experience Synology’s latest enterprise backup solution

How to choose the right Enterprise Linux platform?

Enhance your data protection strategy for 2025

Strengthen your cybersecurity with DNS best practices