Google Cloud introduces the preview of its A4X virtual machines. They are supported by the Nvidia GB200 NVL72, a configuration of 72 Blackwell GPUs. This should make the instances suitable for training and deploying next-generation AI models.
With this, Google Cloud is taking a substantial step in making more powerful AI infrastructure available. The new A4X VMs provide more than 1 exaflop of computing power per GB200 NVL72 system, resulting in a four-fold improvement in LLM training speed compared to the A3 VMs with NVIDIA H100 GPUs.
The GB200 NVL72 configuration should provide very low latency in multimodal processing. AI normally takes more time in such requests. In addition, the A4X VMs contain the Nvidia Grace CPUs. These custom Arm chips feature NVLink chip-to-chip connections with the Blackwell GPUs. This optimizes offloading and better matches computing power to the needs of AI training.
Architecture for AI workloads.
Google assures that the 72 Blackwell GPUs function as a single unified compute unit with shared memory. This enables more efficient training and deployment of complex AI models. For networking, Google Cloud relies on RDMA over Converged Ethernet, which combines NVL72 racks into single clusters of tens of thousands of GPUs. This should especially enable the efficient scaling of complex models.
The A4X virtual machines are fully integrated with Google’s existing AI solutions, including Cloud Storage FUSE, Google Kubernetes Engine (GKE) and Vertex AI Platform. They use Google’s third-generation liquid cooling, essential for maintaining maximum compute performance.
Tip: Nvidia announces successor Blackwell chips, but competition tightens