2 min

Tags in this article

, , , ,

Google Cloud is expanding its TPU and GPU offerings for AI workloads. TPU v5e is available in preview immediately, while A3 GPU VMs will become generally available starting next month.

Google Cloud has announced two additions to its offerings for AI infrastructure. AI workloads show huge differences among themselves, and the cloud provider is keen to have a portfolio that serves each AI workload according to specific needs. The same train of thought recently inspired the cloud provider to release new cloud storage services.

Also read: Store your AI workloads in Google Cloud’s customized cloud storage services

TPU v5e

The first addition should make managing tensor processing units (TPUs) easier. TPUs are used to make the training of an AI model more efficient by using the available hardware wiser.

Google Cloud promises an offering that will half the training time required compared to the previous Cloud TPU v4 offering. This also provides cost savings while there are no sacrifices on the end of performance or flexibility, according to the cloud provider. “We balance performance, flexibility and efficiency with TPU v5e pods, allowing up to 256 chips to be interconnected with a total bandwidth of more than 400 Tb/s and 100 petaOps of INT8 performance.”

TPU v5e can integrate with Google Kubernetes Engine (GKE). This brings the benefits of automatically scaling the platform as needed and adopting workload orchestration.

Furthermore, integration is possible with Vertex AI and several widely used frameworks such as Pytorch, JAX and TensorFlow.

A3 GPU VMs

A3 virtual machines (VMs) are powered by eight H100 Tensor Core GPUs from Nvidia and two fourth-generation Intel Xeon processors. A total of 2TB of storage is provided. Specific workloads that the VMs aim to serve are generative AI workloads and LLMs that demand a lot of computing power.

Compared to the previous generation of VMs, Google Cloud promises that training will be three times faster and available network bandwidth will be distributed ten times more efficiently.