Last week, Google DeepMind made the Interactions API available as a public beta. The new API represents a fundamental change in how developers work with AI models: from stateless to a stateful architecture with server-side context management. With this move, Google is following the path that OpenAI embarked on in March 2025 with its Responses API.
Over the past two years, developers have been working with generative AI via a so-called ‘completion’ model: you send a prompt, you get an answer, and that’s the end of the transaction. For follow-up questions, the entire conversation history has to be sent along each time. Which is why it is stateless. However, this stateless architecture does not work well for complex AI agents that need to use various tools, keep track of extensive context, and engage extended thinking to come up with the best solution.
The new Interactions API aims to solve this by supporting server-side state. Developers no longer need to manage and send the entire conversation history. Instead, they send a so-called previous_interaction_id. Based on that previous_interaction_id, Google can retrieve the conversation history from its servers. This includes previous results from tools and AI models.
Background execution for long-running tasks
Another major change is background execution. As soon as you start working with complex workflows that take more than a few minutes to complete, you can encounter timeout errors. A standard web request can often only take 60 to 600 seconds, depending on the web server configuration. If you have a process or agent that has to search through many web pages or analyze reports, you will quickly encounter HTTP timeouts.
The Interactions API allows developers to start an agent with background=true. This immediately disconnects the connection and allows the result to be retrieved later.
“Models are becoming systems and over time, might even become agents themselves,” wrote DeepMind’s Ali Çevik and Philipp Schmid in the official blog post. “Trying to force these capabilities into generateContent would have resulted in an overly complex and fragile API.”
Google versus OpenAI: transparency or efficiency?
Google is choosing a similar path to OpenAI, but with its own twist. OpenAI already moved away from stateless in March 2025 with its Responses API. Both are moving away from stateless to make context more available, but the chosen route is quite different.
OpenAI’s Responses API introduced Compaction, a feature that compresses conversation history. This focuses purely on the output, and removes tool outputs and reasoning chains. This improves token efficiency but creates a black box that hides the model’s previous reasoning.
Google’s Interactions API, on the other hand, keeps the entire history available and composable. The data model allows developers to debug, manipulate, stream, and reason about messages. Google prioritizes transparency and full searchability over compression.
Native MCP support and available models
Google also embraces the open ecosystem by providing native support for Model Context Protocol (MCP). This allows Gemini models to directly invoke external tools hosted on remote servers without developers having to write code for this. In this way, all kinds of information can be quickly retrieved, and context can be improved.
The Interactions API is now available in public beta via Google AI Studio. It supports the full spectrum of Google’s latest generation of models: Gemini 3.0 (Gemini 3 Pro Preview), Gemini 2.5 (Flash, Flash-lite, and Pro), and the Deep Research Preview agent.
The pricing structure remains the same, with the standard rates for input and output tokens of the models applying. It does depend on how long the history of interactions is to be retained. The free tier has a retention period of 1 day, while the paid tier keeps the interactions available for 55 days.