Chris Wright: Metal-to-agent is the foundation of scalable enterprise AI

Chris Wright: Metal-to-agent is the foundation of scalable enterprise AI

Red Hat is adapting its platform strategy to the changing enterprise AI market. The exploding token consumption by AI agents demands more control over infrastructure. The open source “metal-to-agent” approach is the answer to current needs in the market. We spoke with CTO Chris Wright about this vision and flexibility.

The AI market is in a remarkable paradox. On the one hand, AI token prices per unit are dropping spectacularly, with the costs of generating or processing data via LLMs decreasing by 75 to 90 percent per year. This makes the technology fundamentally more accessible than before. On the other hand, the volume of token consumption within enterprise environments is rising by more than 500 percent per year. This explosive volume growth not only neutralizes the price declines, but also creates an acute and growing need for more efficient infrastructure.

The rise of advanced reasoning models has especially led to additional token usage. It involves a factor of 10 to 20 more token consumption compared to standard models, because these systems internally generate complex trade-offs and chains of logic before formulating an answer. When organizations move to autonomous AI agents that continuously monitor, plan, and execute business tasks, consumption increases by yet another factor of five. This explains why the current approach to enterprise AI, which primarily relies on external APIs referencing proprietary frontier models, becomes financially and operationally unsustainable in the long term. “To be successful in the token economy, you have to transition from being merely a token consumer to actually becoming a token provider,” Wright explains.

According to Wright, organizations that own their own AI infrastructure and run self-hosted models will ultimately emerge as the winners from this transition. Real flexibility arises when a company has full control over which model is used for which specific task, without the downsides of vendor lock-in and without unpredictable variable costs. This requires a fundamentally different technological foundation in the form of a platform that bridges the gap between the raw compute power of the hardware and the abstract logic of autonomous AI agents.

Enterprise AI through a combined offering

Red Hat’s answer to this challenge is Red Hat AI Enterprise, an integrated platform that is internally described as a “metal-to-agent” stack. “Metal-to-agent means from the lowest hardware level of the stack up to the agent itself: all the software, the infrastructure layer, the inferencing services, the model services, and the agent services,” Wright explains. Red Hat AI Enterprise is intended to bring together the chain of hardware suppliers, model developers, and application builders into a single open source architecture.

To properly understand how the portfolio is structured, it is essential to distinguish between the core products. On the one hand, there is the standalone product Red Hat AI Inference, which is specifically designed for efficiently running models as containers and can be deployed flexibly across a variety of Kubernetes or Linux environments. On the other hand, there is the broader Red Hat AI Enterprise. This platform includes the exact same inferencing foundation, but adds crucial enterprise capabilities on top. “That includes guardrails, security, red teaming, the Models-as-a-Service capabilities and AgentOps. All of that comes together in AI Enterprise, so it is a broader offering,” says Wright.

The full stack is composed of five tightly connected layers that flow seamlessly into one another. This layered approach is outlined below.

The lower layers

At the base of this layered architecture lies Red Hat Enterprise Linux as a highly stable operating system, which is directly coupled with Red Hat OpenShift as the overarching Kubernetes platform. Within this infrastructure layer, Red Hat addresses a number of persistent operational bottlenecks. Through strict network isolation, it can be defined exactly which systems and data sources a specific AI component is allowed to access. At the same time, advanced GPU sharing ensures that costly hardware is used optimally. Instead of an expensive GPU sitting idle waiting for a specific task, OpenShift automatically and dynamically divides the available compute power based on current calls.

In the inferencing layer, the vLLM project is particularly crucial for Red Hat, where it is the largest open source contributor. This project has in a relatively short time grown into the de facto industry standard for implementing, managing, and maintaining LLMs. Because virtually every major new model update on the market is already optimized for vLLM on its launch day, organizations are assured of immediate compatibility with the latest technologies. On top of this foundation, Red Hat has developed the distributed inferencing framework llm-d. This orchestration layer analyzes incoming queries and routes them optimally across the available servers. In just one year, optimizations within this framework have resulted in a threefold increase in token throughput and a tenfold reduction in the time between asking a question and generating the first answer. In addition, this layer stabilizes response times, which is important for online services that must operate within strict service level objectives.

Model services and the AI gateway

Once the underlying inferencing layer is solid, the challenge shifts to exposing AI models internally in a secure and efficient way. Red Hat AI Enterprise centralizes this complex process via Model-as-a-Service (MaaS). This turns AI models into shared resources accessible through API endpoints. MaaS includes an AI gateway, a component that acts as a single overarching control panel for all model interactions within the organization. Through this gateway, IT administrators can configure detailed token quotas, manage specific access rights per team, and assign priorities to business critical applications. This effectively prevents a small experimental project from accidentally consuming all available GPU capacity and thereby jeopardizing regular business operations.

In line with this, Red Hat also has a validated models program, which validates the most relevant and stable open weights and open source-licensed models. Examples of models that have gone through this program include IBM Granite and Mistral. All models are thoroughly validated by Red Hat engineers and optimized for maximum speed and efficiency on the supported enterprise infrastructure. Data services are linked to these models for applying techniques such as enriching AI answers with external business data and fine tuning models for highly specific internal workflows.

From models to autonomous AI agents

The top of the metal-to-agent stack is occupied by the agent services. AI agents have now become the undisputed core of modern enterprise AI strategies and have long since left the purely experimental stage behind. “We are rapidly approaching the point where it is entirely normal for large companies to run thousands or even tens of thousands of specific agents simultaneously to optimize their processes,” says Wright. This scale up in turn brings operational and strategic challenges. First, there is an exponential increase in the required compute capacity, as these agents reason in continuous loops, query external systems, and plan actions. Second, there is the risk of an uncontrolled proliferation of tools and frameworks brought in by different departments, also known as agent sprawl.

Red Hat’s philosophy in this regard is “bring your own agents”, but under the strict condition that this is centrally facilitated and monitored by AgentOps. This management layer enables organizations to transform the chaos of experimental proliferation into a safe and highly controllable model. Every agent receives a verified digital identity within this system, can be equipped with precise version control, and is subjected to automated security tests to proactively mitigate potential risks. To ensure that IT teams always maintain full visibility into processes, the platform relies heavily on the open standard OpenTelemetry. As a result, data streams for logging and tracing flow continuously and without interruption through the entire chain, from the hardware level up to the final visible actions of the autonomous agent.

The AI Factory

Red Hat adheres to a strictly hardware agnostic approach with its “any accelerator” philosophy. The platform therefore supports accelerators from major providers such as Nvidia, AMD, and Intel. At the same time, it consciously opts for additional collaborations with these parties. Nvidia currently stands out in particular as market leader, and even the latest AI Factory is based on it. “Our platform remains hardware agnostic at its core,” Wright explains. “But when there is a joint stack with Nvidia, the accelerator that many companies prefer, then of course it runs on that hardware.” Within this, Nvidia brings in the Nvidia Inference Microservices, or NIMs. Each NIM contains an inference engine, a model, and an API for access. These can be LLMs, but also computer vision models or physics models. The AI Factory covers all five layers of the metal-to-agent approach.

There is also native support for the latest Nvidia Blackwell GPUs. As a logical and safe addition, Red Hat functions as the trusted execution layer of the stack. This ensures that the complex decisions and actions initiated by autonomous AI agents can be automatically executed on the underlying IT infrastructure in a direct and cryptographically secure way.

Agentic AI in production

For organizations that truly want to run AI at scale in production, Wright identifies several challenges that need to be overcome. The very first step is defining the intended use case. “When building an agent that does something useful for the company, you need to think carefully about your use case. Identifying a use case will define the data requirements. It is important to have reliable access to that data and a way to integrate that data securely into your agentic workflow,” says Wright.

When mapping out data requirements, it must be clear which existing systems need to be connected in order to feed essential context into the AI agent. Consider an old ERP system that contains a significant amount of business information. How can that sensitive corporate data safely and compliantly flow into the agentic workflow? “Just like in application development, you can build something quickly on your laptop, but you will deploy it in a production environment with security measures, observability capabilities, and scalability. One of the challenges is therefore to move from a simple development environment to a production environment,” Wright explains.

Acceleration of open source

According to Red Hat’s CTO, the absolute necessity of a flexible and open AI infrastructure is further underscored by the extreme pace at which open source alternatives are currently being developed worldwide. The gap between large proprietary models and equivalent open source alternatives is shrinking at a rapid rate. Where it once took eight long months before Meta’s open Llama 2 model managed to approach the level of the very first ChatGPT models, the model DeepSeek-R1 followed within barely five months after the launch of OpenAI-o1. The trend is clear: open-source responds faster with every iteration.

This acceleration makes an enterprise AI strategy that is based on long term vendor lock-in with a single provider far too risky. The AI market is moving many times faster than the average sluggish contract cycles of large companies. Only by choosing an open and broadly layered platform can organizations truly prepare themselves for the technological future.

Also read: Chris Wright: AI needs model, accelerator, and cloud flexibility