5 min Devices

HPE offers AI at every scale for Nvidia’s Vera Rubin portfolio

HPE offers AI at every scale for Nvidia’s Vera Rubin portfolio

Although AI hardware is in high demand, the difficulty of integrating it with existing infrastructure has led to a lack of success in production. HPE is aware of this and advocates for Nvidia-powered AI solutions that are largely plug-and-play, regardless of scale.

There are no constraints, HPE representatives say ahead of Nvidia’s annual GTC conference. It identifies three groups that currently use or want to use AI. First, AI model builders; they have the most demanding requirements in terms of scale and performance per chip. Second, there are AI service providers, who aim to offer an integrated solution using, among other things, HPE ProLiant servers and Aruba networking. The final group consists of what HPE calls the “sovereigns,” such as governments and organizations in highly regulated industries. There are new, scalable solutions for each group.

Blades and neoclouds

HPE is introducing the HPE Cray Supercomputing GX240 blade today. It is suitable for existing Cray users and customers with similar profiles, such as large laboratories and academic institutions, where the upgrade delivers improvements in both efficiency and performance. A single blade can support up to 8 nodes, each with two new Nvidia Vera CPUs, resulting in up to 1,408 ARM-based CPU cores per blade. There is also no shortage of system memory, with up to 24.5 TB of LPDDR5 RAM. A single GX5000 rack can support 40 blades, resulting in 640 CPUs and thus 56,320 ARM cores per rack.

Neoclouds (such as CoreWeave and Nebius) are also “first AI adopters” in HPE’s terminology. They can best leverage Vera Rubin’s unified architecture because their focus is on scaling up and down. Since the integrated Nvidia Vera Rubin NVL72 by HPE effectively operates as a single system, as many bottlenecks as possible have been eliminated. On-site support and networking help keep the liquid-cooled system (down to the chip “die,” the piece of silicon itself) under control. HPE describes the co-design with Nvidia as “extreme” due to the deep integration achieved by the engineering teams from both parties. An LLM with a trillion parameters could run on a single NVL72 system, with up to ten times lower inference token costs and up to four times fewer GPUs required to train Mixture-of-Experts models, compared to previous systems based on Nvidia Blackwell.

Smaller is also possible

As AI becomes more prevalent within companies, more organizations will take the plunge into their own AI hardware. That, at least, is the goal of the more traditional HPE Compute XD700 servers, based on Nvidia’s reference design, the HGX Rubin NVL8. Up to 128 Rubin GPUs can fit into a rack using these servers, double the capacity of the previous generation.

However, the Nvidia component is reduced slightly, as Intel is the CPU supplier for these systems. According to HPE, Xeon 6 processors are sufficiently scalable to support future models. Configurations scale from two racks to thousands with an OCP-“inspired” design, so presumably with few deviations from the industry standard. HPE Services is available for these systems to provide predictable daily operations and support.

More Nvidia Collaboration

Returning to the “extreme” co-engineering that HPE claims to have carried out with Nvidia: this manifests itself in hardware, software, and services alike. For instance, HPE now also supports the functionality of Run.ai out-of-the-box, which helps maximize the utilization of all available GPUs. Run.ai, acquired by Nvidia, has thus clearly become a priority so that end users can truly utilize their valuable hardware as much as possible.

The reason for pursuing such integrations is not simply to offer a more expensive solution. HPE states that a company needs four to seven experts just to design an AI system from start to finish, not to mention the procurement of servers and cabling. With this new offering, HPE aims to provide a “cloud-like” experience where its own expertise, resources, and third-party tooling help eliminate the need for those experts.

A private cloud for all kinds of purposes

It’s also interesting that HPE reveals exactly what customers are doing with their turnkey solution, HPE Private Cloud AI. The largest group of users opts to provide an AI platform via Inferencing-as-a-Service (36 percent). This is followed by 20 percent of users of Retrieval-Augmented Generation (RAG), 16 percent prioritizing OCR, 8 percent using it for research and healthcare, another 8 percent for IT Ops, and finally computer vision, accounting for 4 percent of the installed base.

To address sovereignty concerns without requiring excessive scale, HPE Private Cloud AI is now more scalable. A base rack supports up to 16 GPUs with an optional network expansion rack. Air-gapped solutions support up to 128 GPUs via additional expansion racks. For example, anyone who wants to simulate a factory line as a digital twin using Nvidia AI-Q and Omniverse Blueprints can do so via this air-gapped solution.

Read also: HPE offers VMware alternative with enterprise-grade KVM in HPE Private Cloud