Agentic AI relies on a multitude of technological innovations. One of the most notable is Retrieval-Augmented Generation (RAG), the technique at the core of enabling AI systems to acquire insights from business data. As is often the case, innovation is iterative. The new ‘Instructed Retriever’ architecture finds up to 70 percent more relevant information than AI with RAG alone could achieve. How does that work exactly?
The Instructed Retriever (IR) was conceived by a Databricks team, which explains it in great detail. It is considered an extension of RAG, with various possibilities for integrating IR to steer agentic AI in the right direction. The simplest (and fastest) way for RAG to work is to give an AI model free rein to parse through business data. These models are fast and efficient at this compared to humans. At the same time, the researchers see many shortcomings with such a basic setup. Above all, LLMs often just do not adhere to user instructions. Beyond that rather fundamental problem, AI models regularly fail to grasp the context of their sources very well, especially when it comes to highly domain-specific data. In addition, off-the-shelf models are unable to reason about their output before sending it to the user.
One improvement for RAG implementations was to use multi-step agents. These allow AI models to reason about their output before it gets shown to the user. However, the AI agent still lacks understanding of the context, and the speed and efficiency of RAG are a thing of the past. In all, this is merely a band-aid solution and adds complexity without improving consistency. Databricks posits that the Instructed Retriever remedies most, but not all, of the shortcomings. All aforementioned problems surrounding RAG can be mitigated, but not all at once.
A good listener
For the end user, the Instructed Retriever should be practically invisible. As with previous RAG setups, users enter a query and can receive an answer from an AI chatbot. However, under the surface, there is much more going on than just linking an AI model to the company data. The Instructed Retriever acts as a tool for an agent or a static workflow that always triggers when queried. Instead of the system specifications (instructions, examples of ‘good’ answers, available metadata) influencing the query, as is the case with RAG, they set the rules for both retrieval (the data search) and generation (answering the user’s question).
Although Databricks still talks about ‘reasoning’ here, the structure of the Instructed Retriever is more complex than just consisting of ‘reasoning steps’. An AI model is severely restricted because it not only has to consider what data it could look up, but at the architectural level, it can only look up relevant information. That’s the theory, anyway, and any prompt that’s open to interpretation may leave an unsatisfactory answer. IR just dramatically reduces the chance of that happening.
In order to properly interpret the user’s query, the Instructed Retriever must take various elements of the system specifications into account. First, the IR splits the query (such as “year,” “division,” and “revenue” if the user requests the revenue for a specific year and a specific division), it ranks the data based on relevance and translates the user’s natural language into the technically correct database query (from “this year” to “WHERE date BETWEEN ‘2026-01-01’ AND ‘2026-12-31′”, for example).
The stumbling block of plain language
The great promise of AI is that one needs far less technical expertise to consult complex systems than ever before. In theory, a business manager without IT knowledge could now find the annual turnover of a particular division all by themselves. However, that promise faces a major hurdle: AI models themselves lack the accuracy and consistency to provide reliable answers from backend infrastructure. They need help, or guidance, if you will, and evolutionary steps on top of RAG were the first way to offer it. Nevertheless, the Instructed Retriever shows how incomplete previous solutions actually were. Translating natural language into domain-specific queries appears to be necessary at the architectural level, not just reliant upon AI models becoming better thinkers. Benchmarks show that the improvement is significant with IR over basic RAG. The Instructed Retriever improves performance by 70 percent compared to traditional implementations.
Notably, the Instructed Retriever does not always outperform a traditional RAG setup. GPT-5.2 and Claude 4.5 Sonnet score higher on the new StaRK-Instruct and StaRK-Amazon, for example. It should be noted that this is a comparison between two relatively huge LLMs with at least hundreds of billions of parameters. The Instructed Retriever contains only 4 billion of them. This is a world of difference when it comes to efficiency, because the models from OpenAI, Anthropic, and Databricks’ IR ultimately achieve very similar benchmark ratings. IR scores roughly 90-95 percent of what GPT-5.2 and Claude achieve.
Conclusion: complexity still can beat simplicity
Based on Databricks’ results, we can conclude that ‘naive’ RAG should be a thing of the past for many organizations. The Instructed Retriever is available within Agent Bricks, and we expect Databricks’ competitors to soon have to follow suit with similar concepts. Speed is no excuse for the shortcomings of basic RAG implementations, and the more complex design of IR proves its value with 70 percent better results.
It turns out that AI can actually overcome fundamental limitations at a certain scale, but that scale is gigantic. Such triumphs for generic LLMs only occur when they’re sized for multiple data center GPUs, and they still barely beat out a tool (the Instructed Retriever) that runs fine even on a regular CPU at its tiny set of 4 billion parameters. A rough rule of thumb is that 1GB of RAM is needed per billion parameters, meaning the workload is so tiny as to be a rounding error for compute costs. The fact that Databricks has devised an architecture with so few parameters for managing queries deserves praise and saves a lot of clock cycles the world over. The decrease in API costs for AI and the increased accuracy will quickly pay back the investment in an IR application. The days of naive RAG appear to be over.
Read also: Databricks working on Series L round for valuation of $134 billion