Microsoft expands AKS with RAG functionality and vLLM support

During KubeCon, Microsoft announced that it supports Retrieval Augmented Generation (RAG) in KAITO on Azure Kubernetes Service (AKS) clusters. In addition, vLLM is available as standard with the AI toolchain operator add-on.

Adding RAG support in KAITO is an important step for developers who want to implement advanced search capabilities on their AKS clusters. With this feature, users can deploy the RAG engine within minutes with a supported embedding model to index and search large datasets. This is done via a KAITO inference service URL.

Higher processing speed with vLLM

Another improvement is that the AI toolchain operator add-on now implements model inference workloads with the vLLM serving engine by default. According to Microsoft, this engine offers a significant acceleration in processing incoming requests. This allows developers to use OpenAI-compatible APIs, DeepSeek R1 models and various pre-trained HuggingFace models.

For developers who prefer HuggingFace Transformers to vLLM, Microsoft offers the option to switch between these engines at any time for KAITO inference deployments.

Customized GPU driver installation

The third update concerns skipping automatic GPU driver installation, a function that is now generally available. By default, AKS installs NVIDIA GPU drivers when a node pool is created with a VM size that supports NVIDIA GPUs. With this new option, users can now choose to install custom GPU drivers themselves or to use the GPU Operator on both Linux and Windows node pools.

Tip: Microsoft significantly expands Azure Kubernetes Service

Microsoft expands AKS with RAG functionality and vLLM support

Higher processing speed with vLLM

Customized GPU driver installation

Stay tuned, subscribe!

Nvidia pays Groq $20B without technically acquiring it

DeepSeek breakthrough gives LLMs the highways it has long needed

Nvidia finalizes Intel share purchase, gains billions already

The consequences of the Nexperia saga, so far

NetSuite founder reveals AI transformation 5 years in the making

What makes Salesforce agents reliable? Architecture explained

Workday CTO outlines bold AI agent strategy and major acquisitions

From MSP to MIP: Pax8's vision for Managed Intelligence Providers

Digital sovereignty: from buzzword to business imperative

Why specialized LLMs are the future of generativeAI

The year of the AI agents: why 2026 is about value, not technology

ARTPEC-9 and Axis Edge Vault: a unified hardware-rooted framework for cybersecure surveillance

Appdevcon

Webdevcon

Dutch PHP Conference

GITEX ASIA 2026

SAS Innovate 2026

Team '26

Experience Synology’s latest enterprise backup solution

How to choose the right Enterprise Linux platform?

Enhance your data protection strategy for 2025

Strengthen your cybersecurity with DNS best practices