Nvidia’s Groq 3 LPU targets agentic AI inference at GTC 2026

Nvidia’s Groq 3 LPU targets agentic AI inference at GTC 2026

Nvidia kicked off GTC 2026 in San Jose with a wave of chip and server rack announcements. Most eyes gravitated toward the Rubin GPU, Nvidia’s next-generation powerhouse. But it’s the Groq 3 LPU, the first tangible result of Nvidia’s $20 billion deal with Groq, that shows where the company is heading to meet changing AI demands.

Just three months after Nvidia licensed Groq’s technology and hired founder Jonathan Ross along with President Sunny Madra, the first chip to be released under the new partnership is ready. The speed of that turnaround is notable, although it can be safely assumed the majority of the development time took place well before Groq effectively got acquired.

Groq built its Language Processing Units specifically for AI inference, running AI models, rather than training them. As we previously explored, the LPU’s architecture functions as a software-defined assembly line for AI workloads, moving data directly between on-chip memory modules without the overhead inherent to Nvidia’s general-purpose GPU design. That makes it very fast indeed, and sidesteps memory bandwidth bottlenecks inherent to GPUs with separate memory modules.

The Groq 3 carries that philosophy further. Its memory, while smaller than what Nvidia’s GPUs offer, delivers 40 petabytes per second of bandwidth, enabling inference speeds that outpace anything a GPU can manage. The chip ships in dedicated Groq 3 LPX server racks, each holding 256 LPUs with 128 gigabytes of solid-state random access memory. The only close equivalent, relatively speaking, is Cerebras, which signed a key deal with AWS this week.

Also read: Cerebras partnership breathes new life into AWS Trainium

Faster tokens for a faster agentic world

Ian Buck, Nvidia’s vice president of hyperscale and high-performance computing, framed the Nvidia-Groq partnership in clear terms. Groq 3 acts as a coprocessor to the Rubin GPUs, boosting performance at “every layer of the AI model on every token,” he said. The target throughput for agentic communications: up to 1,500 tokens per second. That figure is a direct response to a genuine shift in requirements, where 100 tokens per second feels sufficiently fast to a human reader, it would be glacial for AI agents that continuously communicate with one another.

The Groq 3 LPX rack is designed to pair with Nvidia’s new Vera Rubin NVL72, which combines Rubin GPUs with the company’s new Vera CPUs. Together, the two systems are optimized for trillion-parameter models and million-token context windows. Nvidia says the combination delivers 35 times higher throughput per megawatt of power and ten times greater revenue opportunity for data center operators.

Also read: Dell gives AI Factory an Nvidia Vera Rubin upgrade

Five new racks, one clear direction

Groq 3 LPX and Vera Rubin NVL72 are two of five new server rack systems Nvidia announced at GTC. The others are a dedicated Vera CPU rack, the Bluefield-4 STX storage rack, and the Spectrum-6 SPX networking rack. The Vera Rubin platform encompasses seven chips and five rack-scale systems in total.

Tip: HPE offers AI at every scale for Nvidia’s Vera Rubin portfolio