Google makes breakthrough in efficiency for AI agents

Google makes breakthrough in efficiency for AI agents

Google researchers, in collaboration with the University of California, Santa Barbara, have developed a new framework that helps AI agents use computing power and tools more efficiently. 

This is according to a recent paper on arXiv, as reported by VentureBeat. The research focuses on a growing problem in agentic AI: how to scale the use of external tools without costs and latency becoming unmanageable.

Test-time scaling for AI agents is increasingly shifting from longer thinking to controlling tool calls. In many practical applications, such as web search and document analysis, the number of external actions determines how deep an agent can dig. Each tool call increases the context window, increases token consumption, and incurs additional API costs. For companies, this can quickly add up.

The researchers note that allocating more budget to an agent does not always lead to better performance. According to the authors, many agents lack any awareness of their available resources. They follow a single lead for too long, spend dozens of tool calls on a seemingly relevant direction, and only discover late in the process that it is a dead end. As a result, extra computing budget is consumed without any significant gain in quality.

As a first step, the researchers introduce Budget Tracker, a simple module that continuously informs the agent about the remaining budget. This approach works entirely at the prompt level and does not require retraining. The agent receives explicit signals about resource usage and can adjust its strategy accordingly. In Google’s implementation, the tracker also includes guidelines that indicate which behavior is appropriate for different budget levels.

Experiments with search agents that work according to a ReAct-like method show that this approach is effective. The paper shows that Budget Tracker can reduce the number of search calls by more than 40 percent and browse calls by almost 20 percent, while total costs fall by more than 30 percent. At the same time, performance continues to improve at higher budgets, where traditional agents tend to stall.

Verification and budget awareness in a single iterative process

In addition to this lightweight solution, the paper describes a more comprehensive framework called Budget Aware Test-time Scaling, or BATS. BATS combines planning, verification, and budget awareness in a single iterative process. The agent dynamically adjusts its behavior based on the remaining budget and decides whether to continue its investigation or change course.

Tests on benchmarks such as BrowseComp and HLE-Search, including Gemini 2.5 Pro as the underlying model, show that BATS achieves higher accuracy at lower cost than existing methods.