Google now supports training generative AI models on up to 65,000 Kubernetes nodes, which the company says is ten times the capacity of competing services.
OpenAI’s GPT-4 reportedly contains 1.8 trillion parameters, but the largest model based on publicly available information is Llama 3, which features 405 billion parameters. Training these large language models (LLMs) not only takes a lot of time, computing power, and money but is often simply not feasible on public cloud instances. However, Google Cloud seems to have made a significant advancement in this area.
Previously, Google Kubernetes Engine (GKE) supported clusters with 15,000 nodes, which was sufficient for today’s LLMs. But in anticipation of tomorrow’s models, Google Cloud now supports 65,000 interconnected nodes. These nodes do not use GPUs, which are typical for most generative AI workloads, but instead employ Google’s own TPU (v5e) chips, with each node containing four chips. This means a single cluster can now have up to 250,000 accelerators.
Multislice
How is it possible to coordinate so many TPUs effectively? Linking on this scale is usually only achievable with complex networking. However, the TPU v5e, introduced in 2023, supports “Multislice” technology, which allows for near-linear scaling and enables the efficient use of 65,000 nodes. Achieving this required more than just the new TPU technology, including a complete overhaul of the entire GKE infrastructure and the use of Google’s own distributed database, Spanner, instead of the open-source etcd.
The practical implications of this advancement remain to be seen. While models continue to grow larger, there may be an eventual limit to the linear increase in skills that LLMs have demonstrated with more parameters. For example, GPT-3 was significantly more effective than GPT-2, despite the 175 billion parameters of GPT-3 compared to the 1.5 billion of GPT-2.
Importantly, Google’s new cluster is not solely for training giant models. The company believes researchers also need this level of cloud infrastructure. “Centralizing computing power within the smallest number of clusters gives customers the flexibility to quickly adapt to changes in demand from inference, research, and training workloads,” say Drew Bradstock and Maciek Różacki of Google Cloud and GKE, respectively.
Also read: Google launches GKE Enterprise for easier Kubernetes management