IBM-AI autonomously generates creative titles for images

Scientists at IBM Research have developed an artificial intelligence (AI) model that can generate various, creative and convincing titles for photographs. The scientists describe the model in a paper, which was presented at the Conference on Computer Vision and Pattern Recognition.

In order to build the system, solutions had to be found for problems with automatic title creation systems, writes Venturebeat. Such systems often produce sentences that are syntactically correct, but homogeneous, unnatural and semantically irrelevant.

IBM’s scientists solved this problem with an ‘attention captioning model’, which allows the creator of the titles to use fragments of scenes in the photo it is observing to make sentences. At each step of title generation, the AI model has the choice of using visual or textual hints from the previous step.

To ensure that the generated heads do not sound too much like a robot, the team has deployed a generative adversarial network (GAN) to train the model. A GAN is a two-part neural network consisting of generators that produce monsters and of discriminators that try to see the difference between generated monsters and monsters from the real world. A co-attention discriminator calculates how natural the sentences are via a model that combines pixel-level scenes with generated words.

Preventing prejudices

Another common problem in such systems is having prejudices. For example, systems make an analysis that is too close to a specific set of data, after which they cannot handle scenes where objects they know occur in contexts that are unknown to the model.

To prevent that, IBM Research had to make a diagnostic tool. The researchers proposed using a test corpus with images with titles designed in such a way that a model’s poor performance indicates that the analysis is too close to the dataset.

The final model was tested with people from Amazon’s Mechanical Turk. They had to indicate which titles were generated by the AI model and determine how well each title described the corresponding image. People were shown both real and generated examples. The researchers state that their model had a “good” performance. The researchers believe that the model could be the beginning of powerful new computer vision systems.

This news article was automatically translated from Dutch to give Techzine.eu a head start. All news articles after September 1, 2019 are written in native English and NOT translated. All our background stories are written in native English as well. For more information read our launch article.

Building on 50 years analytics, SAS charts the future of AI

With close to fifty years of experience, SAS has guided organizations through the major shifts in analytics. ...

Berry Zwets 3 days ago

ChatGPT Data Collective gives users control over their data

Critics argue that AI companies exploit user data without permission or compensation. The new ChatGPT Data Co...

Berry Zwets July 2, 2025

Top story

Inside TCS’ digital race behind Formula E

The world of Formula E combines technology and speed with sustainability. It's a blend that Tata Consultancy ...

Erik van Klinken June 27, 2025

Mistral launches Voxtral: open-source speech recognition for businesses

Mistral is launching its new Voxtral speech models, designed to serve as an alternative to closed APIs offere...

Berry Zwets July 15, 2025

Expert Talks

Tech calendar

IBM-AI autonomously generates creative titles for images

Preventing prejudices

Stay tuned, subscribe!

Replatforming virtualized workloads: Do your VMs need a new home?

Broadcom launches Tomahawk Ultra with 250ns network latency

AI requires mature choices from companies

Building on 50 years analytics, SAS charts the future of AI

HPE's rise as a credible virtualization player started with Morpheus acquisition

HPE Aruba Networking's new agentic AI approach to networking (and security)

SAP Sapphire Orlando: Unveiling a new pricing strategy

Is ServiceNow competing with Salesforce? We talk to Amit Zavery

How AI and automation are redefining ROI in the enterprise

Enhancing video encoding: The AV1 support in the new ARTPEC-9 System-on-Chip

How organisations can remain compliant while building resiliency during the AI era

GITEX DIGI_HEALTH 5.0 - Thailand

IT Arena

Innovation Week 2025

Luxembourg Venture Days

Appdevcon

Webdevcon

Experience Synology’s latest enterprise backup solution

How to choose the right Enterprise Linux platform?

Enhance your data protection strategy for 2025

Strengthen your cybersecurity with DNS best practices