llm-d joins the CNCF

llm-d has been officially accepted as a CNCF Sandbox project. This places the project under the Linux Foundation’s management and establishes an open standard for AI inference across any accelerator and any cloud environment.

The Cloud Native Computing Foundation (CNCF) has accepted llm-d as an official Sandbox project. This places the distributed inference framework under the management of the Linux Foundation, giving organizations the assurance of building on a neutral, open standard. llm-d was launched in May 2025 as a joint initiative of Red Hat, Google Cloud, IBM Research, CoreWeave, and Nvidia, with one clear vision: any model, any accelerator, any cloud.

Since then, AMD, Cisco, Hugging Face, Intel, Lambda, and Mistral AI have joined as partners. The universities of California (Berkeley) and Chicago, well-known names in the vLLM and LMCache world, also support the project. With its CNCF admission, llm-d now gains the governance structure and open leadership that companies need to build on it seriously.

Kubernetes-native inferencing as a first-class workload

The project addresses a specific bottleneck: AI serving is stateful and latency-sensitive, while traditional service routing and autoscaling are completely blind to these factors. This leads to inefficient placement, cache fragmentation, and unpredictable latency. llm-d addresses this by serving as the primary implementation of the Kubernetes Gateway API Inference Extension (GAIE) and providing inference-aware traffic management via the Endpoint Picker (EPP).

Additionally, the framework offers Prefill/Decode Disaggregation. Prompt processing and token generation are split into separately scalable pods. Hierarchical KV cache offloading distributes memory load across GPU, CPU, and storage. The latest v0.5 release shows that llm-d maintains near-zero latency in a multi-tenant SaaS scenario and scales up to approximately 120,000 tokens per second.

Avoiding vendor lock-in is a core principle. Through model- and state-aware routing policies, llm-d directs requests to the most suitable hardware from Nvidia, AMD, or Google, improving metrics such as Time to First Token (TTFT) and token throughput. The project also aims to become the standard for open, reproducible inference benchmarks.

llm-d joins the CNCF

Kubernetes-native inferencing as a first-class workload

Stay tuned, subscribe!

Oracle Releases Java 26: AI, Security, and the Java Verified Portfolio

Salesforce makes Contact Center much more effective with Agentforce

AI chatbots can still tell you how to make a bomb

Identity has become malleable for cyber attackers

How Capgemini transformed HR for 400,000 employees globally

IFS builds an industrial AI ecosystem through partnerships

How Ansible becomes the execution layer for agentic AI

NetSuite founder reveals AI transformation 5 years in the making

Better connected business technology is essential for prosperity in the Netherlands

The zero-drift frontier: modern edge demands on Kubernetes

When is an SBOM not an SBOM? CISA’s Minimum Elements

Sovereign: the new normal for AI and cloud native (and how to make it work)

De IT Afdeling van de toekomst

GITEX ASIA 2026

GITEX ASIA 2026

Southeast Asia AI Application Summit 2026

SAS Innovate 2026

Team '26

Experience Synology’s latest enterprise backup solution

How to choose the right Enterprise Linux platform?

Enhance your data protection strategy for 2025

Strengthen your cybersecurity with DNS best practices