Red Hat AI 3 tackles the complexity of AI inferencing

Red Hat is launching a revamped version of its AI platform. Red Hat AI 3 is designed to help organizations move AI workloads from proof-of-concept to production more efficiently. The platform focuses primarily on inference, the execution phase of enterprise AI.

Research by the Massachusetts Institute of Technology shows that approximately 95 percent of organizations see no measurable financial return on the roughly $40 billion (€34.4 billion) spent on enterprise AI applications. For many companies, the step from AI experiments to actual production is a huge challenge.

Red Hat AI 3, which includes Red Hat AI Inference Server, RHEL AI, and Red Hat OpenShift AI, aims to bridge this gap by providing a consistent, uniform experience. “With Red Hat AI 3, we are providing an enterprise-grade, open source platform that minimizes these hurdles,” says Joe Fernandes, vice president and general manager of Red Hat’s AI Business Unit. The platform builds on vLLM and llm-d community projects.

Tip: Chris Wright: AI needs model, accelerator, and cloud flexibility

Scalability and cost control

The focus is on inference tasks, or the execution phase of AI. Red Hat OpenShift AI 3.0 introduces llm-d, which runs large language models natively on Kubernetes. This approach combines intelligent distributed inference with the proven value of Kubernetes orchestration.

To maximize hardware acceleration, the technology leverages open-source components, including the Kubernetes Gateway API Inference Extension, Nvidia Dynamo (NIXL) KV Transfer Library, and the DeepEP Mixture of Experts communication library. This enables organizations to reduce costs and improve response times through smart model scheduling and disaggregated serving.

The platform also offers operational simplicity with prescribed “Well-lit Paths” that streamline the rollout of models at scale. Cross-platform support provides flexibility in deploying LLM inference on different hardware accelerators, including Nvidia and AMD.

Collaboration platforms

Red Hat AI 3 delivers new capabilities for teams working on generative AI solutions. Through Model-as-a-Service functionality that builds on distributed inference, IT teams can act as their own MaaS providers by centrally offering common models.

The new AI hub enables platform engineers to discover, deploy, and manage foundational AI assets. It provides a central hub with a curated catalog of models, including validated and optimized gen AI models.

For AI engineers, there will be a Gen AI studio. This is a hands-on environment for interacting with models and quickly building prototypes of new gen AI applications. The built-in playground provides an interactive, stateless environment for experimenting with models.

Prepared for AI agents

Red Hat is positioning itself for the rise of AI agents. These autonomous workflows will place heavy demands on inference capabilities. The OpenShift AI 3.0 platform lays the foundation for scalable agentic AI systems.

The company is introducing a Unified API layer based on Llama Stack. This helps with alignment with industry standards, such as OpenAI-compatible LLM interface protocols. Red Hat also embraces the Model Context Protocol (MCP), which streamlines how AI models interact with external tools.

In addition, there will be a modular, extensible toolkit for model customization built on existing InstructLab functionality. This provides specialized Python libraries that offer greater flexibility and control for developers.

Red Hat AI 3 is designed to help organizations move AI initiatives out of the experimental phase.

Red Hat AI 3 tackles the complexity of AI inferencing

Scalability and cost control

Collaboration platforms

Prepared for AI agents

Stay tuned, subscribe!

Slack claims that conversation data is a gold mine for AI agents

ASML exceeds expectations thanks to AI hype

Atlassian makes Rovo AI available everywhere and for everyone

Nutanix CTO explains their VMware alternative and multi-cloud strategy

Oracle Database @ AWS: best of both worlds?

Infor's industry-specific ERP strategy and Velocity Suite deep dive

Slack is evolving into a work operating system

How to Safeguard and Prepare Exchange Server against Natural Disasters?

Minimizing liability is not the same as security: Lessons learned from Collin’s Aerospace cyberattack

How Split-Second Data Performance and Sovereignty Keep the Netherlands Moving

How to Recover My Archived PST Files in Outlook?

The Next Chapter in Cybersecurity with Imperva + Thales

Luxembourg Venture Days

Dell Technologies Forum

BrickCon The Databricks Community Conference

Appdevcon

Webdevcon

Experience Synology’s latest enterprise backup solution

How to choose the right Enterprise Linux platform?

Enhance your data protection strategy for 2025

Strengthen your cybersecurity with DNS best practices