Google launches long-awaited Ironwood TPU for AI inferencing

Following an announcement in April, Google is launching its seventh-generation Tensor Processing Unit (TPU). The chip, codenamed Ironwood, can scale up to nearly 10,000 units in a single pod. Together with new Axion instances on Arm architecture, the company promises major performance improvements. Anthropic will be one of the first users with up to one million TPUs.

Anthropic’s commitment is perhaps the most significant achievement of the Ironwood launch. Details about the chip were largely shared during Google Cloud Next in April, such as its fourfold performance improvement over the sixth-generation TPU. Anthropic, creator of Claude, will have more than a gigawatt of capacity at its disposal in 2026.

James Bradbury, Head of Compute at Anthropic, emphasizes that both inference performance and scalable AI training are required. The infrastructure must also remain reliable. Google reports 99.999 percent uptime for liquid-cooled systems since 2020. Optical Circuit Switching reroutes network traffic in case of problems, allowing workloads to continue without visible interruption.

Ironwood as the AI inferencing powerhouse

Ironwood is specifically built for inferencing, which is the actual running of trained AI models. The focus is on speed and reliability, which are crucial for chatbots or coding assistants that need to respond quickly. Ironwood is therefore not explicitly intended to compete with the most powerful Nvidia Blackwell GPUs, which are currently the preferred hardware for AI training.

A single Ironwood pod connects 9,216 chips with an Inter-Chip Interconnect of 9.6 terabits per second. This network provides access to 1.77 petabytes of High Bandwidth Memory. According to Google, this delivers 118 times more FP8 ExaFLOPS than its closest competitor. Compared to its predecessor Trillium, Ironwood offers four times better performance for both training and inferencing. This is in addition to software improvements that end users may have implemented in the meantime.

The hardware comes with new software. Google Kubernetes Engine gets Cluster Director for maintenance and topology awareness. The open-source MaxText framework now supports advanced training techniques. The Inference Gateway optimizes load balancing, reducing time-to-first-token latency by 96 percent and lowering serving costs by up to 30 percent.

Axion expands with new instances

In addition to Ironwood, Google is introducing two new Axion-based instances. The Arm-based CPU focuses on general workloads such as data analysis and web serving. N4A instances offer up to 64 virtual CPUs with 512 gigabytes of DDR5 memory. C4A metal provides dedicated physical servers with 96 vCPUs and 768 gigabytes of memory.

Video website Vimeo saw 30 percent better performance for transcoding workloads compared to x86 VMs. ZoomInfo saw a 60 percent improvement in price-performance ratio for Java services. In other words, Axion is not only faster, but also appears to be significantly more cost-effective than its predecessors.

All of this encompasses Google’s idea of an AI Hypercomputer, which is essentially the sum of hardware, software, and a revenue model designed to make AI deployment as simple and scalable as possible. Google cites IDC research that measured a 28 percent reduction in IT costs and 55 percent more efficient IT teams. To achieve this, Google is undergoing a transformation behind the scenes. Today, the company uses 400-volt DC power delivery for one megawatt per rack, ten times more than server racks in conventional setups.

Google launches long-awaited Ironwood TPU for AI inferencing

Ironwood as the AI inferencing powerhouse

Axion expands with new instances

Stay tuned, subscribe!

Qualcomm is entering the B2B market with Snapdragon innovations

Why large parts of Europe can’t build AI factories

KnowBe4: how workforce trust weans organizations off mistrust

Power shortages hamper global AI expansion

From MSP to MIP: Pax8's vision for Managed Intelligence Providers

Oracle Database @ AWS: best of both worlds?

SAP's AI workforce strategy: upskilling 100,000 employees

Slack is evolving into a work operating system

AI Integrity: The Invisible Threat Organizations Can’t Ignore

Three Ways Secure Modern Networks Unlock the True Power of AI

How to Safeguard and Prepare Exchange Server against Natural Disasters?

Discover Why Northern Europe Chooses Redgate Monitor

Dell Technologies Forum

BrickCon The Databricks Community Conference

Appdevcon

Webdevcon

Dutch PHP Conference

Experience Synology’s latest enterprise backup solution

How to choose the right Enterprise Linux platform?

Enhance your data protection strategy for 2025

Strengthen your cybersecurity with DNS best practices