Kubernetes automation platform CAST AI introduces AI Optimizer, a service to reduce the expenses of deploying Large Language Models (LLMs). This tool integrates with any OpenAI-compatible API endpoint, automatically identifying the most efficient LLM among commercial vendors and open-source options.
It deploys the selected LLM on CAST AI optimized Kubernetes clusters, promising significant savings in generative AI. AI Optimizer offers insights into model usage, fine-tuning costs, and optimization decisions, enhancing transparency in model selection. The service was announced during Google Cloud Next ‘24 in Las Vegas.
Pressure to keep up with advances in AI
According to CAST AI, the growth in LLM adoption puts pressure on companies to keep up with AI advancements. Issues include computing availability, costs of running the models and the sheer diversity of available models. CAST AI’s solution promises to lower the cost barrier while avoiding the need to completely overhaul complex systems.
“What makes AI Optimizer so compelling is that it significantly reduces costs without requiring organizations to swap their existing technology stacks or even change a line of application code, which will help democratize generative AI”, said CAST AI Co-Founder and CTO Leon Kuperman.
Integration with OpenAI’s API
AI Optimizer integrates with OpenAI’s API and utilizes metrics like user-specific costs, overall usage patterns, token balance, and potential cost savings from model fine-tuning. It then selects the most efficient LLM with the lowest inference costs. It also leverages available GPUs, including Spot instances (allowing users to bid for unused computing capacity at a lower price) and provides budgeting and alerting features.
When paired with an efficient autoscaler, significant cost reductions are anticipated on AWS, Azure, and GCP. According to Kuperman, combining their LLM orchestration framework and deployment on optimized Kubernetes clusters promises unparalleled efficiency and scalability.
While the ability to identify optimal LLMs is already available, automated deployment on CAST AI’s optimized clusters will be available later this quarter.
Also read: Can any organization train generative AI models? CAST AI brings it a step closer