Nvidia's Groq 3 LPU targets agentic AI inference at GTC 2026

Nvidia kicked off GTC 2026 in San Jose with a wave of chip and server rack announcements. Most eyes gravitated toward the Rubin GPU, Nvidia’s next-generation powerhouse. But it’s the Groq 3 LPU, the first tangible result of Nvidia’s $20 billion deal with Groq, that shows where the company is heading to meet changing AI demands.

Just three months after Nvidia licensed Groq’s technology and hired founder Jonathan Ross along with President Sunny Madra, the first chip to be released under the new partnership is ready. The speed of that turnaround is notable, although it can be safely assumed the majority of the development time took place well before Groq effectively got acquired.

Groq built its Language Processing Units specifically for AI inference, running AI models, rather than training them. As we previously explored, the LPU’s architecture functions as a software-defined assembly line for AI workloads, moving data directly between on-chip memory modules without the overhead inherent to Nvidia’s general-purpose GPU design. That makes it very fast indeed, and sidesteps memory bandwidth bottlenecks inherent to GPUs with separate memory modules.

The Groq 3 carries that philosophy further. Its memory, while smaller than what Nvidia’s GPUs offer, delivers 40 petabytes per second of bandwidth, enabling inference speeds that outpace anything a GPU can manage. The chip ships in dedicated Groq 3 LPX server racks, each holding 256 LPUs with 128 gigabytes of solid-state random access memory. The only close equivalent, relatively speaking, is Cerebras, which signed a key deal with AWS this week.

Also read: Cerebras partnership breathes new life into AWS Trainium

Faster tokens for a faster agentic world

Ian Buck, Nvidia’s vice president of hyperscale and high-performance computing, framed the Nvidia-Groq partnership in clear terms. Groq 3 acts as a coprocessor to the Rubin GPUs, boosting performance at “every layer of the AI model on every token,” he said. The target throughput for agentic communications: up to 1,500 tokens per second. That figure is a direct response to a genuine shift in requirements, where 100 tokens per second feels sufficiently fast to a human reader, it would be glacial for AI agents that continuously communicate with one another.

The Groq 3 LPX rack is designed to pair with Nvidia’s new Vera Rubin NVL72, which combines Rubin GPUs with the company’s new Vera CPUs. Together, the two systems are optimized for trillion-parameter models and million-token context windows. Nvidia says the combination delivers 35 times higher throughput per megawatt of power and ten times greater revenue opportunity for data center operators.

Also read: Dell gives AI Factory an Nvidia Vera Rubin upgrade

Five new racks, one clear direction

Groq 3 LPX and Vera Rubin NVL72 are two of five new server rack systems Nvidia announced at GTC. The others are a dedicated Vera CPU rack, the Bluefield-4 STX storage rack, and the Spectrum-6 SPX networking rack. The Vera Rubin platform encompasses seven chips and five rack-scale systems in total.

Tip: HPE offers AI at every scale for Nvidia’s Vera Rubin portfolio

Nvidia’s Groq 3 LPU targets agentic AI inference at GTC 2026

Faster tokens for a faster agentic world

Five new racks, one clear direction

Stay tuned, subscribe!

Oracle: sovereignty is a matter of trust, not just technology

ASML to build large new campus in Eindhoven

Salesforce makes Contact Center much more effective with Agentforce

Claude Code gets tool for checking code

Why SAP says best-of-breed software era is over

Sophos CEO sees "cybersecurity poverty line": what to do about it?

What makes Salesforce agents reliable? Architecture explained

Workday Rising EMEA: platform transformation: Pipedream, AI agents and sovereignty

The Zero-Drift Frontier: Modern Edge Demands on Kubernetes

When is an SBOM not an SBOM? CISA’s Minimum Elements

Sovereign: the new normal for AI and cloud native (and how to make it work)

De IT Afdeling van de toekomst

GITEX ASIA 2026

GITEX ASIA 2026

Southeast Asia AI Application Summit 2026

SAS Innovate 2026

Team '26

Experience Synology’s latest enterprise backup solution

How to choose the right Enterprise Linux platform?

Enhance your data protection strategy for 2025

Strengthen your cybersecurity with DNS best practices