CAST AI introduces cost-cutting solution AI Optimizer for Large Language Models

Kubernetes automation platform CAST AI introduces AI Optimizer, a service to reduce the expenses of deploying Large Language Models (LLMs). This tool integrates with any OpenAI-compatible API endpoint, automatically identifying the most efficient LLM among commercial vendors and open-source options.

It deploys the selected LLM on CAST AI optimized Kubernetes clusters, promising significant savings in generative AI. AI Optimizer offers insights into model usage, fine-tuning costs, and optimization decisions, enhancing transparency in model selection. The service was announced during Google Cloud Next ‘24 in Las Vegas.

Pressure to keep up with advances in AI

According to CAST AI, the growth in LLM adoption puts pressure on companies to keep up with AI advancements. Issues include computing availability, costs of running the models and the sheer diversity of available models. CAST AI’s solution promises to lower the cost barrier while avoiding the need to completely overhaul complex systems.

“What makes AI Optimizer so compelling is that it significantly reduces costs without requiring organizations to swap their existing technology stacks or even change a line of application code, which will help democratize generative AI”, said CAST AI Co-Founder and CTO Leon Kuperman.

Integration with OpenAI’s API

AI Optimizer integrates with OpenAI’s API and utilizes metrics like user-specific costs, overall usage patterns, token balance, and potential cost savings from model fine-tuning. It then selects the most efficient LLM with the lowest inference costs. It also leverages available GPUs, including Spot instances (allowing users to bid for unused computing capacity at a lower price) and provides budgeting and alerting features.

When paired with an efficient autoscaler, significant cost reductions are anticipated on AWS, Azure, and GCP. According to Kuperman, combining their LLM orchestration framework and deployment on optimized Kubernetes clusters promises unparalleled efficiency and scalability.

While the ability to identify optimal LLMs is already available, automated deployment on CAST AI’s optimized clusters will be available later this quarter.

Also read: Can any organization train generative AI models? CAST AI brings it a step closer

Top story

Inside TCS’ digital race behind Formula E

The world of Formula E combines technology and speed with sustainability. It's a blend that Tata Consultancy ...

Erik van Klinken June 27, 2025

Tech calendar

Stay tuned, subscribe!

HPE’s strategy: AI, smart switches, GreenLake and beyond

Memory-safe malware: Rust challenges security researchers

Microsoft set to withdraw direct kernel access from security software

Children with autism treated months earlier thanks to process automation

EU launches action plan for cybersecurity in healthcare

ChatGPT is a bad doctor, but that shouldn’t surprise anyone

Orange Cyberdefense turns security into a business enabler

The AI reality tour

GITEX DIGI_HEALTH 5.0 - Thailand

IT Arena

Innovation Week 2025

Luxembourg Venture Days

Appdevcon

Experience Synology’s latest enterprise backup solution

How to choose the right Enterprise Linux platform?

Enhance your data protection strategy for 2025

Strengthen your cybersecurity with DNS best practices