Red Hat is launching a revamped version of its AI platform. Red Hat AI 3 is designed to help organizations move AI workloads from proof-of-concept to production more efficiently. The platform focuses primarily on inference, the execution phase of enterprise AI.
Research by the Massachusetts Institute of Technology shows that approximately 95 percent of organizations see no measurable financial return on the roughly $40 billion (€34.4 billion) spent on enterprise AI applications. For many companies, the step from AI experiments to actual production is a huge challenge.
Red Hat AI 3, which includes Red Hat AI Inference Server, RHEL AI, and Red Hat OpenShift AI, aims to bridge this gap by providing a consistent, uniform experience. “With Red Hat AI 3, we are providing an enterprise-grade, open source platform that minimizes these hurdles,” says Joe Fernandes, vice president and general manager of Red Hat’s AI Business Unit. The platform builds on vLLM and llm-d community projects.
Tip: Chris Wright: AI needs model, accelerator, and cloud flexibility
Scalability and cost control
The focus is on inference tasks, or the execution phase of AI. Red Hat OpenShift AI 3.0 introduces llm-d, which runs large language models natively on Kubernetes. This approach combines intelligent distributed inference with the proven value of Kubernetes orchestration.
To maximize hardware acceleration, the technology leverages open-source components, including the Kubernetes Gateway API Inference Extension, Nvidia Dynamo (NIXL) KV Transfer Library, and the DeepEP Mixture of Experts communication library. This enables organizations to reduce costs and improve response times through smart model scheduling and disaggregated serving.
The platform also offers operational simplicity with prescribed “Well-lit Paths” that streamline the rollout of models at scale. Cross-platform support provides flexibility in deploying LLM inference on different hardware accelerators, including Nvidia and AMD.
Collaboration platforms
Red Hat AI 3 delivers new capabilities for teams working on generative AI solutions. Through Model-as-a-Service functionality that builds on distributed inference, IT teams can act as their own MaaS providers by centrally offering common models.
The new AI hub enables platform engineers to discover, deploy, and manage foundational AI assets. It provides a central hub with a curated catalog of models, including validated and optimized gen AI models.
For AI engineers, there will be a Gen AI studio. This is a hands-on environment for interacting with models and quickly building prototypes of new gen AI applications. The built-in playground provides an interactive, stateless environment for experimenting with models.
Prepared for AI agents
Red Hat is positioning itself for the rise of AI agents. These autonomous workflows will place heavy demands on inference capabilities. The OpenShift AI 3.0 platform lays the foundation for scalable agentic AI systems.
The company is introducing a Unified API layer based on Llama Stack. This helps with alignment with industry standards, such as OpenAI-compatible LLM interface protocols. Red Hat also embraces the Model Context Protocol (MCP), which streamlines how AI models interact with external tools.
In addition, there will be a modular, extensible toolkit for model customization built on existing InstructLab functionality. This provides specialized Python libraries that offer greater flexibility and control for developers.
Red Hat AI 3 is designed to help organizations move AI initiatives out of the experimental phase.