2 min

IBM researchers describe in four different papers ways to improve natural language processing. These include new semantic parsing techniques, a method for integrating incomplete knowledge bases with corpora and a tool for recruiting experts to refine interpretable, rules-based systems.

Salim Roukos, senior manager at IBM Reserach, states that the natural language processing systems of large companies often face challenges due to multiple factors. This includes the use of heterogeneous silos of information, incomplete data and the training of accurate models with small amounts of data, writes Venturebeat.

We explore multiple themes to address these challenges and improve natural language processing for enterprise purposes.


The first study focused on an abstract meaning representation (AMR). This is a data structure that allows similar sentences to have the same representation.

In the research, the scientists used reinforcement learning, which is an artificial intelligence training technique (AI) that uses rewards to guide software policy towards certain goals.

In this way, the authors of the study were able to bring the semantic accuracy of a target graph to 75.5 percent. Previously, the maximum was 74.4 percent.

Multiple knowledge bases

Another IBM team wrote in a paper about an approach for queries, where semantic parsing is unified across multiple knowledge bases. The technique uses the structural similarity between query programs to search through different knowledge bases.

That work is in line with that of yet another team. In it, IBM scientists studied incomplete knowledge bases and how they can be combined with a body of text.

This is an approach that, in their view, can lead to better answers to questions that have not been fully addressed in their knowledge bases or individual documents.


In the last paper the researchers describe a tool called Human-in-the-loop linguistic Expressions with Deep Learning (HEIDL). This tool sorts machine-generated expressions by precision and recall.

In one of the experiments, IBM lawyers annotated in 20,000 sentences of nearly 150 contracts sentences related to important clauses, such as termination, communication and payments. HEIDL then analysed them to provide high-level insights.

A team of data scientists used this to identify an average of seven rules that automatically labeled the contracts in about half an hour. According to the scientists it would have taken a week or more to do this by hand.

This news article was automatically translated from Dutch to give Techzine.eu a head start. All news articles after September 1, 2019 are written in native English and NOT translated. All our background stories are written in native English as well. For more information read our launch article.