Red Hat unlocks what’s next with Model-as-a-Service and AgentOps

Red Hat unlocks what’s next with Model-as-a-Service and AgentOps

At Red Hat Summit, the overarching theme is “unlock what’s next.” This naturally encompasses topics like virtualization and hybrid cloud, but certainly also AI. In the realm of artificial intelligence, the open-source company is stepping on the gas even harder, and that has now resulted in Red Hat AI 3.4. The new version aims to bridge the gap between AI experimentation and large-scale production deployment.

Red Hat CEO Matt Hicks stated during the keynote that AI is the biggest technological turning point ever. He sees it as a greater change than open source, Linux, and the public cloud. This is because AI transcends traditional IT and impacts the entire business. Previous disruptive technological advancements were primarily an IT matter.

Whenever such a technological shift occurs, the debate flares up over whether companies need to rebuild everything from scratch to remain competitive. Hicks is clear on this: that is not the reality. “In every previous inflection point, regardless of the desire or the energy that is put into trying to rebuild everything, there has always been a balance left for enterprises.” That balance is what companies ultimately arrive at, no matter what is attempted.

Red Hat aims to facilitate precisely that balance with its platforms. Companies must leverage AI to the fullest while ensuring that what keeps their business running today continues to work. Red Hat AI 3.4 is designed to be a key asset in this regard, by supporting both developers who build models and infrastructure administrators who run them—two groups that, in practice, still frequently work at cross-purposes.

Model-as-a-Service as a central element

Model-as-a-Service (MaaS) plays a key role in this release. Through the Red Hat AI Gateway, platform administrators gain a central, secure interface to manage model access, track usage, and enforce policies. Developers access models via standard OpenAI-compatible APIs, so they don’t need a different approach for every environment. Unified governance thus applies to both internal and external models.

Under the hood, vLLM powers the inference, an area in which Red Hat has significant expertise due to its earlier acquisition of Neural Magic. This is complemented by the distributed inference framework llm-d for scalable deployment in Kubernetes environments. Also new in this release is the general availability of speculative decoding. According to Red Hat, this technique increases response speed by a factor of two to three, with minimal loss of quality. This also reduces the cost per interaction. Furthermore, vLLM now supports CPU-based infrastructure, offering an option for smaller language models.

Tip: Chris Wright: AI Needs Model, Accelerator, and Cloud Flexibility

Managing agents from development to production

AI agents are driving the growing demand for inference capacity. Red Hat addresses this with a new AgentOps toolkit that manages agents throughout their entire lifecycle. This includes integrated tracing of LLM calls, tool calls, and reasoning steps, as well as cryptographic identity management via SPIFFE/SPIRE. The latter replaces static, hard-coded keys with short-lived tokens and links every action to a verified identity.

To manage tools for agents, Red Hat is introducing an MCP server catalog and an associated MCP gateway. These provide runtime access to MCP-based tools. The new tracing capabilities are built on MLflow, which becomes generally available in version 3.4 as a core platform component. MLflow also provides experiment tracking and artifact management for both generative and predictive AI applications.

In addition to tracing, Red Hat is introducing an Evaluation Hub. This is a framework-agnostic control plane for evaluating LLMs, AI applications, and agents for quality, accuracy, and risk. It replaces fragmented testing methods with a single integrated approach.

Automated security testing and prompt management

For security validation, Red Hat integrates automated red teaming directly into the development cycle. The technology behind this feature comes from Chatterbox Labs, which Red Hat previously acquired. The platform uses Garak to scan models and agentic systems for risks such as jailbreaks, prompt injections, and bias. For runtime security, there is also Nvidia NeMo Guardrails.

Another new feature is Prompt Lab and Registry. This is a central repository for prompts as full-fledged data assets. It provides both developers and administrators with a single source of truth for the inputs that drive models and agents.

Expanded Collaboration with Nvidia

Red Hat is also deepening its collaboration with Nvidia. The Red Hat AI Factory with Nvidia, announced earlier this year as an integrated solution combining Red Hat AI Enterprise with Nvidia AI Enterprise, is gaining new capabilities. Notably, it now integrates with OpenShell, an open-source project founded by Nvidia to provide agents with a sandboxed runtime. As a contributor to this project, Red Hat is involved in standardizing agent management in hybrid cloud environments.

In addition, Red Hat Enterprise Linux for Nvidia 26.01 is now generally available, with day-zero support for Nvidia Blackwell. Both companies are already working on support for the upcoming Nvidia Vera Rubin architecture. Nvidia Run:ai, now part of Nvidia AI Enterprise, is also available to AI Factory customers.

To accelerate deployment, Red Hat and Nvidia offer jointly validated blueprints and AI quickstarts. These focus on use cases such as Model-as-a-Service, Enterprise RAG & RAFT, and semantic search on enterprise data.

“The agentic era represents an evolution of our platform from running traditional applications to powering intelligent, autonomous systems,” concludes Joe Fernandes, vice president and general manager of the AI Business Unit at Red Hat. “We are defining the open standard for how the enterprise executes AI.”

Red Hat AI 3.4 and the updated Red Hat AI Factory with Nvidia will be available later this month.

Tip: Red Hat lays the groundwork for AI inference: Server and llm-d project