Microsoft's computer vision AI beats humans at captioning images

In its efforts to improve the quality of life for the visually impaired by using AI systems, Microsoft has made an AI capable of automatically captioning images in a way that outperforms humans. This kind of technology may one day be used to automatically caption images shared online so that those dependent on computer vision can know what is being displayed.

Computer vision plays an important role in modern systems by giving machines the capability to view, interpret and comprehend what they see.

In many autonomous vehicles, the concept of computer vision is a key one and has spread to other industries like rapid sorting and organization of images and medical imaging.

Advanced outcomes

In a newly published study, the Microsoft Researchers detailed how they developed the AI systems, which generates high-quality captioning. They call it Visual Vocabulary (VIVO). It is a pre-training model that learns visual vocabulary using a dataset of paired image-tag data.

The findings show that AI can generate new state-of-the-art outcomes that exceeded what humans could come up with.

Microsoft notes that the alternative text captions for images are not available on social media or websites. This technology could be the inclusivity tool we have been looking for.

Ongoing work

The company previously introduced products with computer vision as the driving technology, specifically for the blind. They called it Seeing AI. The setup involves a camera app that audibly describes the physical objects around, reads printed text or currency and recognizes colours or other common items.

The Seeing AI app can read image captions. However, they tend not to be included in many places.

The inclusion of these captions could be done automatically using AI that can see and, in a real-time, report what it is looking at.

Top story

Domain-specific AI beats general models in business applications

Visma’s AI team is quietly redefining document processing across Europe. With a background spanning nearly ...

Berry Zwets 20 hours ago

Tech calendar

Microsoft’s computer vision AI beats humans at captioning images

Advanced outcomes

Ongoing work

Stay tuned, subscribe!

What we know about SafePay, the Ingram Micro attackers

Domain-specific AI beats general models in business applications

Is English the next programming language? JetBrains’ CEO says no

Domain-specific AI beats general models in business applications

AI without ethics will never truly serve humanity

How do you roll out GenAI in enterprise environments?

What is Retrieval-Augmented Generation?

Krijg Volledig Inzicht van Gebruiker tot Cloud met Cisco ThousandEyes

GITEX DIGI_HEALTH 5.0 - Thailand

IT Arena

Innovation Week 2025

Luxembourg Venture Days

Appdevcon

Experience Synology’s latest enterprise backup solution

How to choose the right Enterprise Linux platform?

Enhance your data protection strategy for 2025

Strengthen your cybersecurity with DNS best practices