OpenAI announced new features for enterprise customers, focusing on improved security and expanding the Assistants API. This component allows customers to customize and deploy models for their own use. In addition, the company is betting heavily on cost management features.
The Assistants API already allowed companies to train custom LLMs through Retrieval Augmented Generation (RAG), adding proprietary information and internal datasets to their repertoire. The API now supports more advanced file retrieval capabilities, with a new ‘file_search’ feature that can process 10,000 files per Assistant. That is five hundred times more than originally possible: the old API provided for only twenty files per Assistant.
The API also includes new functionality for parallel queries, re-ranking, and query rewriting. That means running multiple queries simultaneously, reordering the results, and rewriting queries using synonyms, related terms, or spelling corrections. All this should increase the relevance of the results for users and save time.
Real-time response streaming
Another new feature for the Assistants API is the addition of much-asked-for real-time response streaming, allowing GPT-4 Turbo and GPT-3.5 Turbo to return outputs at the same rate tokens are generated. That means the AI model no longer has to wait for a full response to be generated before it starts replying, which should lead to faster results. Furthermore, the API gets a ‘vector_store’ object that helps with file management and allows for more granular control over the use of tokens to manage costs.
On the security front, the company will offer Private Link, which allows direct communication between Azure and OpenAI with minimal exposure to the Internet. Also added is native Multi-Factor Authentication (MFA) to meet stricter access control requirements.
Managing individual projects
Under the motto ‘trust is good, but control is better,’ system administrators can tighten their grip on individual projects through the Projects feature. For example, they can specify roles and API keys per project, set model restrictions, and limit users. Again, the approach is to prevent unexpected costs in addition to security.
More cost-saving options for customers include asynchronous workflows for requests that are not urgent, which run via the Batch API. Large users can also expect discounts. OpenAI has shared these and other new features on its website. These updates are partly in response to the growing popularity of open-source LLMs such as Llama 3 and Mistral AI, which pose increasing competition.
Read more: CAST AI promises cost savings with AI Optimizer for LLMs