Red Hat is repositioning its platform strategy to meet the shifting demands of enterprise AI. The company’s recent acquisition of Neural Magic exemplifies this transformation, underscoring the urgency for robust and adaptable AI infrastructure. Red Hat opted for speed and deep domain expertise, acquiring a team contributing to the widely adopted open source inference and serving project vLLM. Thus to support the newest “any model, any accelerator, any cloud” move. We spoke with CTO Chris Wright about the vision, strategy and progression.
The move is part of a broader evolution in Red Hat’s platform strategy. What once focused on “any workload, any app, anywhere” is now reoriented toward “any model, any accelerator, any cloud.” We should see it as more than just a semantic shift. It reflects the growing realization that AI workloads have fundamentally different infrastructure needs compared to traditional enterprise applications.
“We’re not just talking about applications anymore, we’re talking about AI”, Wright explains. “And we draw analogies of AI as a workload, just like applications are a workload.” The change also brings hardware considerations to the forefront, particularly the role of AI accelerators like GPUs and other specialized silicon that are now essential for AI processing.
Historically, Red Hat has focused on abstracting hardware complexity through software. But AI upends that model. High-performance AI workloads demand low-latency inference, and achieving that means recognizing the critical role of hardware accelerators.
“We always talked about enabling different types of accelerators”, said Wright. “It’s important if you think about our role in the infrastructure or platform software layer.” As enterprises seek flexibility in both cloud and hardware provider choices, Red Hat must ensure its architecture supports a heterogeneous set of accelerators. No matter if it’s Nvidia, AMD, Intel, or an emerging vendor.
Navigating a fragmented model landscape
As the model ecosystem has exploded, platform providers face new complexity. Red Hat notes that only a few years ago, there were limited AI models available under open user-friendly licenses. Most access was limited to major cloud platforms offering GPT-like models. Today, the situation has changed dramatically.
“There’s a pretty good set of models that are either open source or have licenses that make them usable by users”, Wright explains. But supporting such diversity introduces engineering challenges. Different models require different model customization and inference optimizations, and platforms must balance performance with flexibility.
In addition to open source tooling across model experimentation and management, Red Hat’s approach includes tooling that automatically optimizes models for its inference engines, making it easier for enterprises to use the models they prefer without compromising on performance or operational efficiency.
Also read: Red Hat lays foundation for AI inferencing: Server and llm-d project
Red Hat has also shifted from a single model strategy to one centered on expanded third-party partnerships. Initially, the company offered no models at all, choosing instead to let customers bring their own models. Then came Granite, the model family developed in collaboration with parent company IBM. But now, Red Hat supports validated models from external providers as well.
“If you go back in time to last year, we had only Granite in Red Hat AI. A year before that, we had no [validated] models”, explains Wright. Today, the strategy includes a mix: validated third-party models, additional models from Hugging Face or cloud providers, and room for customers to customize their own. This marks a shift in Red Hat’s role when it comes to AI, from a pure platform provider to a facilitator of a rich and varied model ecosystem. It reflects the enterprise need for choice, without locking users into a single vendor or framework.
Neural Magic adds model compression to the mix
The acquisition of Neural Magic brings advanced model compression techniques into Red Hat’s OpenShift AI platform. The core idea is simple but powerful: reduce the size of AI models to make them faster and cheaper to run on both CPUs and GPUs.
Neural Magic’s work is based on the two key techniques sparsification and quantization. Sparsification eliminates low-importance weights from neural networks, reducing computation without sacrificing accuracy. Quantization, meanwhile, compresses model weights from high-precision formats (like 32-bit floats) down to 16-bit, 8-bit, or even 4-bit representations.
“If each one of those parameters or weights is a certain fixed size, and you reduce that size, then you reduce the size of the model”, said Wright. Smaller models load faster, run on less powerful hardware, and cost less to operate. This is ideal for hybrid and edge environments.
Originally, Neural Magic focused on predictive models and achieving GPU-like performance on CPUs. But when the generative AI boom accelerated in late 2022, they quickly shifted gears. “All of the focus was on GPUs, and they shifted their focus to inference optimization for generative AI,” Wright looks back.
That pivot aligned perfectly with Red Hat’s evolving needs. While CPU optimization remains important for edge and cost-sensitive use cases, generative AI’s demands make GPU optimization equally critical. Neural Magic’s optimizations now help improve inference performance across hardware types, rather than positioning CPUs as GPU alternatives.
Expanding the Red Hat AI portfolio
The new inference capabilities, delivered with the launch of Red Hat AI Inference Server, enhance Red Hat’s broader AI vision. This spans multiple offerings: Red Hat OpenShift AI, Red Hat Enterprise Linux AI, and the aforementioned Red Hat AI Inference Server under the Red Hat AI umbrella. Along the are embedded AI capabilities across Red Hat’s hybrid cloud offerings with Red Hat Lightspeed. These are not simply single products but a portfolio that Red Hat can evolve based on customer and market demands.
This modular approach allows enterprises to build, deploy, and maintain models based on their unique use case, across their infrastructure. This from edge deployments to centralized cloud inference, while maintaining consistency in management and operations.
Heterogeneous compute becomes the norm
Red Hat envisions a future where heterogeneous compute is not an exception but the default. Generative AI involves multiple stages, tokenization, context handling, prediction, each with different performance characteristics. Some tasks are math-heavy and best run on GPUs, while others are memory-bound and can be efficiently handled by CPUs.
A distributed AI system, intelligently orchestrated, could route tasks to the optimal hardware for each step. Red Hat sees a future where its infrastructure enables these mixed-mode inference clusters, boosting efficiency and lowering operational costs.
Open source meets enterprise AI
Open source remains at the core of Red Hat’s strategy. The company builds from upstream projects and contributes to communities such as vLLM and Kubeflow, ensuring that innovation is transparent and accessible. “We know that the world is much more complicated, it’s not all open source”, Wright acknowledges. But Red Hat doesn’t see open source and proprietary models as mutually exclusive. Instead, it offers a framework where both can coexist, providing customers with choice, control, and performance.
Red Hat’s inferencing solutions are not an isolated move. They represents Red Hat’s broader commitment to AI infrastructure that is open, flexible, and optimized for the enterprise. As Chris Wright and his team push forward, the guiding principle remains clear: support any model, any accelerator, any cloud.