Scientists at the MIT-IBM Watson AI Lab are working on an approach that will eliminate a number of long-term obstacles in the design of artificial intelligence (AI) models. The resulting programs should be able to learn about the world by observing it.
Where people can learn a lot about an environment by looking at it – such as the colour of an object or the distance between two objects – this is a lot more complicated for AI systems. But now that image recognition, the understanding of language and symbolic program execution are emerging, this should become possible.
The problem
IBM and MIT are therefore working on this. Dario Gil, vice president of AI and IBM Q at IBM Research, explains the approach to Venturebeat using an example.
Imagine having a picture of a scene that shows a collection of objects that you need to classify and describe. A deep learning solution to this problem must first be trained on thousands of sample questions, and that model can get confused by variations on the same questions.
The problem has to be divided into several things. You have a challenge for visual recognition, you have a question and you need to understand what the words mean. And then you have to reason logically to solve the problem.
The approach
The symbolic reasoning approach – recently described in a paper by MIT, IBM and DeepMind – must be able to deal with this better. This uses a neurosymbolic concept learer and a model programmed to understand concepts such as objects and spatial relations in text.
The approach has a component that is unleashed on a set of scenes with objects, while another component learns to assign natural language questions to answers. The resulting framework can answer new questions about different scenes by recognizing visual concepts in the questions, making it very scalable.
Moreover, less data is needed than with deep learning approaches. According to Gil it is possible to achieve the same accuracy with 1 percent of the training data. This is good news for the 99.99% of companies that do not have large amounts of labelled data.