The AI arms race thunders on. With Claude 3.0 having been made available in March, Claude 3.5 has already launched before June is out. ‘Sonnet’ is the first LLM to become available from this family of models. In what ways does the model surpass those of competitors OpenAI and Google?
Anthropic set the bar high in announcing the new model. Claude 3.5 Sonnet is said to combine the performance of the very best AI models with the speed and low cost of the “mid-range” Claude 3 Sonnet LLM. In other words, higher AI performance for less money, a promise as unequivocal as it is attractive. Its wide availability through apps, the Anthropic API, Amazon Bedrock and Google Vertex AI means that any organization can already start using Claude 3.5 Sonnet.
Fast and smart
By now, a clear set of industry-standard LLM benchmarks have crystallized. Anthropics latest model performs pretty stunningly in these tests, almost to the point of perfect scores in spots. It reasons and programs better than GPT-4o and scores higher on grade-school maths than Gemini 1.5 Pro and Llama-400b, the largest variant of Llama 3 available only as an early snapshot.
GPT-4o earns wins in two tests (undergraduate-level knowledge and complex math equations), but that’s not all too relevant to what 3.5 Sonnet is all about. Anthropic points us to two major benefits: significantly better performance than Claude 3 Sonnet and a model twice as fast as Claude 3 Opus, the largest LLM in this earlier series. Mission accomplished, it seems.
Vision
Models today must perform multimodally. Text, sound, images: any kind of information source must be understood by a state-of-the-art LLM. In a demonstration video, Anthropic shows the possibilities this creates. For example, the new Sonnet model transcribes a JSON based on graphs about the cost of DNA testing and then even generates a presentation about it.
Tip: Anthropic provides tool to build AI agents with Claude 3
Despite its impressive reasoning, computation and asset creation skills, Claude 3.5 Sonnet is nothing to fear. In terms of AI Safety Level (ASL), this LLM sits at ASL-2, which has limited risks but does not, for example, help build a bioweapon, as Anthropic highlights (or: it doesn’t help you build such a weapon any better than Google Search already does). FYI: ASL-3 is when a model can be catastrophically misdeployed or shows signs of autonomy. We don’t yet live in a reality where such an AI model exists. ASL-4 and beyond has yet to be defined, as it happens.
Closer to reality: reducing costs
Claude 3.5 Sonnet is just one of several 3.5 models that’ll show up over the next few months. However, this model already reflects what Anthropic has been working hardest on: multimodality and efficient performance. For end users with high output requirements, the cost burden just got a little less heavy. This makes it possible to deploy more GenAI for the same price or to keep on using the same amount for less, whatever works best.
Specifically, the model costs $3 per 1 million input tokens and $15 per 1 million output tokens. The token context window is 200K. That’s larger than the “standard” version of Google Gemini 1.5 Pro (with a 128K window) and OpenAI’s GPT-4o (128K too), but significantly less than what Gemini 1.5 Pro is capable of in select cases. In fact, Google’s offering scales up to as many as 1 million tokens for a limited group of users, allowing it to take in much more information for output generation with this special variant. This helps to incorporate extra data into AI answers, so it still handles such larger inputs better than Sonnet. The only question is whether Claude 3.5 Opus won’t kick it up a notch in that area and make a larger context window available to more users.
Also read: Google Gemini made available in AI coding assistant from JetBrains