6 min Devops

How Microsoft scales Azure Kubernetes Service for OpenAI

How Microsoft scales Azure Kubernetes Service for OpenAI

Microsoft’s Azure Kubernetes Service powers some of the world’s most demanding AI workloads, including OpenAI and Anthropic, with clusters scaling to tens of thousands of nodes. Jorge Palma, Principal PM Lead for AKS, reveals how Microsoft pushes Kubernetes boundaries while maintaining unwavering commitment to open source principles.

AKS Automatic: Microsoft’s opinionated Kubernetes platform

Azure Kubernetes Service offers two distinct approaches to Kubernetes management. AKS Standard provides vanilla Kubernetes where users can choose their preferred CNIs for networking, ingresses, and various SKUs. This unopinionated approach gives developers maximum flexibility to build their infrastructure exactly as they envision it.

Watch more videos from KubeCon & CloudNativeCon Europe

In contrast, AKS Automatic represents Microsoft’s fully opinionated application platform. Released to general availability in September 2025, Automatic includes built-in monitoring, security, production readiness, and auto-scaling capabilities. All best practices are implemented by default, allowing developers to focus exclusively on their applications rather than infrastructure decisions.

Palma explains that the choice between Standard and Automatic depends on user preferences. Teams wanting to select specific vendor solutions available at KubeCon can use Standard. Organizations preferring Microsoft’s recommendations and wanting to minimize operational overhead should choose Automatic.

Scaling Kubernetes for AI workloads at OpenAI scale

Microsoft serves as one of the primary customers of Azure itself, with AKS pushing the boundaries of what’s possible with Kubernetes. OpenAI runs on AKS with clusters that have grown from thousands to 50,000 and even 75,000 nodes. This extreme scaling isn’t just focused on reaching high numbers, but also about maintaining performance, delivering control plane responsiveness, and providing all necessary features for AI training and inferencing.

Palma categorizes AI users into three groups. First are the builders of large language models like OpenAI and Mistral, who need powerful infrastructure, GPUs, and extreme scale capabilities. Second are developers creating AI solutions through retrieval augmented generation, fine-tuning, and small language models. Third are the majority of users who leverage existing models for serving and empowering their applications and processes.

Each category has different requirements. Model builders need raw infrastructure power and performance. Model users need opinionated platforms that remove friction and provide fully supported, easy access to models integrated with their applications.

Breaking through Kubernetes scaling limits

Every platform has limits, and Kubernetes historically recommended no more than 100 pods per node and 100 nodes per cluster. These were tested limits at the time, but customer demands have pushed boundaries dramatically. Today, clusters can scale to thousands of nodes with hundreds to thousands of pods per node.

Microsoft’s approach involves a dual investment here. It is seeking to push boundaries within AKS through API server tuning, etcd optimization, and controller adjustments, while simultaneously contributing all improvements back to the Kubernetes community. This ensures portability remains intact and advances benefit the entire ecosystem.

Key technical advancements include etcd compaction improvements, chunked lists implementation, and enhanced API server capabilities. Previously, 100 nodes with kubelet leases and objects in etcd would strain the API server significantly. Today, vanilla Kubernetes installations can handle 10x that load, with further tuning enabling much higher scale.

Open source philosophy: no secret sauce

Microsoft maintains a strong position on open source contributions, Palma says. He emphasizes that AKS runs 100% upstream Kubernetes components without any proprietary modifications or secret sauce. While Microsoft pushes scaling boundaries with customers like OpenAI, there’s a period where manual work and investigation occur to understand performance issues.

Palma says findings are contributed back to the open source community. Microsoft’s philosophy, he notes, focuses on making customers happy through support and integrations, not through proprietary scaling advantages. This approach aims to ensure portability, a core value proposition of Kubernetes, remains intact.

Palma states that while hyperscalers like Microsoft, AWS, and GCP deal with extreme scaling scenarios, the goal is always to bring innovations back to the community. There’s no benefit to creating AKS-specific capabilities that don’t work in vanilla Kubernetes installations.

AI revolutionizes Kubernetes complexity

The perception that Kubernetes is complex has become outdated thanks to AI advancements. Palmer challenges the narrative of complexity by asking “compared to what?” A system enabling millions of tasks will naturally have a larger API surface than one supporting only three tasks.

Operating systems simplified kernel complexity through wrappers and user-friendly tools while maintaining kernel access for those who need it. Microsoft takes the same approach with Kubernetes, creating tools like AKS Desktop, Headlamp (an open source Kubernetes UI), Draft, and Containerization Assist.

The real revolution comes from large language models. Users can now ask any LLM to create production-ready Kubernetes manifests, Dockerfiles optimized for cost-effectiveness and performance, all without deep Kubernetes knowledge. Microsoft created MCP servers to encode best practices, ensuring AI-generated configurations follow production standards.

This allows developers to focus on applications while AI bridges the knowledge gap. Users no longer need to know every Kubernetes API and sub-API but can still leverage its full power through AI assistance.

Bridging the gap for business decision makers

Kubernetes remains fundamentally an engineering tool, creating a language gap between engineers and business decision makers. Microsoft addresses this through multiple approaches, including low-code and no-code platforms built on top of Kubernetes.

The company’s Agentic Operations capabilities, integrated with Microsoft Copilot, enable business users to introspect environments and ask questions like “Are we using our clusters to fullest capacity?” or “Are we cost effective?” Business users can even deploy web APIs for marketing campaigns through AI assistance that automatically containerizes and creates manifests.

Palma identifies two key investment areas: using AI as the conduit to bridge the gap between business and Kubernetes, and building super reliable, solid platforms that require zero operations. When business users deploy to Kubernetes, their expectation is hands-off management with no operational overhead.

The future of enterprise Kubernetes

Microsoft’s vision for AKS centers on maintaining portability while pushing performance boundaries. The company continues investing in making Kubernetes more accessible through AI tools, better abstractions, and opinionated platforms like AKS Automatic that implement best practices automatically.

As AI workloads continue growing in scale and complexity, Microsoft’s work with customers like OpenAI informs improvements that benefit the entire Kubernetes community. The focus remains on delivering reliability, simplicity, and performance without compromising the open source principles that make Kubernetes valuable.

Palma’s message is clear: Kubernetes complexity is being solved not through hiding functionality but through better tooling, AI assistance, and opinionated platforms that make best practices accessible to everyone from expert engineers to business decision makers.

Also read: Kubernetes v1.36 enhances security and AI support