Google unveils model that allows robots to learn like humans

Google unveils the first vision-language-action model (VLA model). The model allows a robot to teach itself actions through text and images from the internet. Less training time is wasted on the robot because the underlying model learns in more or less the same way as humans.

Google’s new VLA model called RT-2 removes much of the complexity of training foundation models for robots. RT-2 develops itself based on text and images from the Internet. This allows a robot to perform actions on which it was not explicitly trained. “In other words, RT-2 can speak robot,” writes Vincent Vanhoucke, head of robotics at Google DeepMind.

More complex than a language model

According to Vanhoucke, learning a language model is much simpler. “Their training is not just about, let’s say, learning everything there is to know about an apple: how it grows, its physical properties, or even that one supposedly landed on Sir Isaac Newton’s head.” A robot must be able to convert the information into actions and make associations based on the information: “A robot must be able to recognize an apple in context, distinguish it from a red ball, understand what it looks like, and above all, know how to pick it up.”

RT-2 would be capable of these things. That is not true for every situation, but it could take action in 62 percent of the “new” scenarios tested and do actions it was not taught. As a result, it performed twice as well as the previous model, RT-1. There is however a small caveat to this included in the technical paper on RT-2. The researchers there indicate that the robot cannot perform actions it was not taught, but that the new actions are always parodies of learned actions.

“Although there is still an enormous amount of work to be done to enable helpful robots in human-centered environments, RT-2 shows us that an exciting future for robotics is within reach,” Vanhoucke concludes.

Tip: Google wants to outsource application development to autonomous robots

Mistral launches Voxtral: open-source speech recognition for businesses

Mistral is launching its new Voxtral speech models, designed to serve as an alternative to closed APIs offere...

Berry Zwets July 15, 2025

Top story

Building on 50 years analytics, SAS charts the future of AI

With close to fifty years of experience, SAS has guided organizations through the major shifts in analytics. ...

Berry Zwets 3 days ago

Top story

Inside TCS’ digital race behind Formula E

The world of Formula E combines technology and speed with sustainability. It's a blend that Tata Consultancy ...

Erik van Klinken June 27, 2025

ChatGPT Data Collective gives users control over their data

Critics argue that AI companies exploit user data without permission or compensation. The new ChatGPT Data Co...

Berry Zwets July 2, 2025

Expert Talks

Tech calendar

Google unveils model that allows robots to learn like humans

More complex than a language model

Stay tuned, subscribe!

Building on 50 years analytics, SAS charts the future of AI

ASML chain moves en masse to Southeast Asia: a sign of things to come?

Replatforming virtualized workloads: Do your VMs need a new home?

Dutch Department of Justice offline after Citrix vulnerability

ServiceNow goes after the mid-market with its AI-based Core Business Suite

Accelerating SAP Adoption with AI: Interview with Thomas Pfiester at SAP Sapphire

What is HPE VM Essentials and is it a direct competitor to VMware?

Is ServiceNow competing with Salesforce? We talk to Amit Zavery

How AI and automation are redefining ROI in the enterprise

Enhancing video encoding: The AV1 support in the new ARTPEC-9 System-on-Chip

How organisations can remain compliant while building resiliency during the AI era

GITEX DIGI_HEALTH 5.0 - Thailand

IT Arena

Innovation Week 2025

Luxembourg Venture Days

Appdevcon

Webdevcon

Experience Synology’s latest enterprise backup solution

How to choose the right Enterprise Linux platform?

Enhance your data protection strategy for 2025

Strengthen your cybersecurity with DNS best practices