Microsoft’s computer vision AI beats humans at captioning images

In its efforts to improve the quality of life for the visually impaired by using AI systems, Microsoft has made an AI capable of automatically captioning images in a way that outperforms humans. This kind of technology may one day be used to automatically caption images shared online so that those dependent on computer vision can know what is being displayed.

Computer vision plays an important role in modern systems by giving machines the capability to view, interpret and comprehend what they see.

In many autonomous vehicles, the concept of computer vision is a key one and has spread to other industries like rapid sorting and organization of images and medical imaging.

Advanced outcomes

In a newly published study, the Microsoft Researchers detailed how they developed the AI systems, which generates high-quality captioning. They call it Visual Vocabulary (VIVO). It is a pre-training model that learns visual vocabulary using a dataset of paired image-tag data.

The findings show that AI can generate new state-of-the-art outcomes that exceeded what humans could come up with.

Microsoft notes that the alternative text captions for images are not available on social media or websites. This technology could be the inclusivity tool we have been looking for.

Ongoing work

The company previously introduced products with computer vision as the driving technology, specifically for the blind. They called it Seeing AI. The setup involves a camera app that audibly describes the physical objects around, reads printed text or currency and recognizes colours or other common items.

The Seeing AI app can read image captions. However, they tend not to be included in many places.

The inclusion of these captions could be done automatically using AI that can see and, in a real-time, report what it is looking at.