Based on data from tens of thousands of clusters, GPU usage averages just 5 percent, CPU usage stands at 8 percent, and memory utilization comes in at 20 percent. The gap between paid and used capacity is growing, while cloud prices are rising.
This is demonstrated by research from Cast AI. It reveals a persistent pattern: the gap between what organizations pay and what they actually use widens as Kubernetes adoption grows. This is striking because Kubernetes was specifically designed to deliver efficiency at scale.
Cast AI notes that Kubernetes is becoming the standard platform for AI and ML workloads, but the data tells the same story as with CPU and memory: an average utilization rate of 5 percent. Meanwhile, an idle GPU costs dollars per hour, whereas an unused CPU costs cents per hour.
One-time configuration falls short
A key insight from the report concerns the approach to configuration. Rightsizing—where IT resources are aligned with the needs of the workloads —is not true rightsizing. It occurs only once during deployment. Workloads change, traffic patterns shift. What was true six months ago no longer applies today. The same applies to Spot Instance selection, autoscaler configuration, and node lifecycle management.
Cast AI advocates for autonomous, continuous optimization as a sustainable response to infrastructure economics moving in the wrong direction.
Tip: Harness secures AI code and AI apps with two new modules