During the first-ever LlamaCon, Meta made several announcements and presented new tools to make Llama models more accessible to developers. The most important announcement was the introduction of the Llama API, which is now available as a limited free preview for developers.
With the Llama API, developers can try out various Llama models, including the recently launched Llama 4 Scout and Llama 4 Maverick. The API offers easy creation of API keys and lightweight TypeScript and Python SDKs. To make the transition from OpenAI-based applications easier, the Llama API is compatible with the OpenAI SDK.
Significant acceleration thanks to partnerships
Meta is collaborating with Cerebras and Groq to achieve higher inference speeds for the Llama API. Cerebras claims that their Llama 4 model within the API can generate tokens up to eighteen times faster than traditional GPU solutions from NVIDIA and others. According to benchmarks from the Artificial Analysis website, Cerebras’ solution achieved over 2,600 tokens per second for Llama 4 Scout, while ChatGPT remained around 130 tokens per second and DeepSeek achieved around 25 tokens per second.
Andrew Feldman, CEO and co-founder of Cerebras, said that Cerebras is proud to make the Llama API the fastest inference API in the world. According to him, developers building real-time and agent-based applications need speed above all else. Thanks to Cerebras, they can develop AI systems that remain out of reach for traditional GPU-based solutions.
Developers interested in this extremely fast Llama 4 inference can select Cerebras as a model option within the Llama API. Llama 4 Scout is also available through Groq, but runs at about 460 tokens per second — about six times slower than Cerebras, but still four times faster than other GPU-based solutions.
Llama Defenders Program
During LlamaCon, Meta also presented new Llama Protection Tools and announced the Llama Defenders Program, which gives selected partners access to AI-driven tools that enable them to evaluate the security of their systems and protect themselves against potential threats.