5 min

Anthropic is throwing a new version of Claude into the fray against OpenAI’s GPT-4. Claude 3 outperforms GPT-4, according to testing by the company. That makes Claude 3 the second LLM to outperform GPT-4. Google already beat the model with Gemini Ultra, but Anthropic says Claude 3 also outperforms this LLM.

Claude 3 is Anthropic’s latest LLM series. A total of three models, which differ in performance and price, are packed under the name. “All models can handle a wide range of visual formats, including photos, graphs, charts and technical diagrams.” Claude 3 Opus is the strongest variant and thus delivers the best results on tests comparing LLMs’ capabilities.

Anthropic takes a partial look at the capabilities of its previous Claude 2.1 model in its comparison. All three models are said to outperform this predecessor. The less powerful models were named Claude 3 Sonnet and Claude 3 Haiku. The improvements over the previous generation would be mainly in the accuracy of answers for non-English prompts and faster answers. In addition, tinkering with the model’s bounds is said to have improved stopping prompts that try to steal data from the model, for example.

‘Better than GPT-4 and Gemini Ultra’

At the same time, it also ventures a comparison with competitor OpenAI’s GPT-4. For this, Anthropic only compares the capabilities of Claude 3 Opus. This model is said to answer complex queries twice as accurately as Claude 2.1 but also outperforms GPT-4 and Gemini Ultra on popular AI benchmarks. One of the tests it used is MMLU (massive multitask Language Understanding). Google also previously used this test to benchmark Gemini Ultra’s performance against GPT-4 and is among its standard evaluation methods.

Anthropic makes the results available in the image below.
The article continues below the image.

Een tabel met de percentages van de omzet van een bedrijf.

Opus and Sonnet are immediately available in claude.ai and the Claude API. The cheapest model, Claude 3 Haiku, will be available “soon.”

Lead investor Amazon makes the model available

Claude 3 will also soon appear in Amazon’s offerings. The company, along with Google, is therefore a major investor in the AI company. That investment took place when Claude 2 was just released. At the time, it was agreed that all future generations of the LLM would become available on the AWS service. The AWS Generative AI Innovation Center, where customers can seek help from an AWS expert in building an AI application, will also be approachable for questions surrounding Anthropic models.

Two billion-dollar investments were made to provide enough funding to build a strong offering. Amazon, as the lead investor, also expects its money to be used to build tools to compete with OpenAI. This competing AI company gets financial backing from major cloud competitor Microsoft.

Google: spreading the risks

For Google, the story is slightly more nuanced. The company developed the LLM’s Gemini. Under that name sits a collection of models designed from the beginning to handle multiple data types. That means users can send multimodal prompts through text, images or video. Gemini can also generate these different formats as output.

Also read: Gemini vs. GPT-4: Google shows what a next-gen AI model offers

However, Claude 3 would already exceed Gemini Ultra’s capabilities. So is Claude 3 already the first sign that Google can shut its AI production down? There are several reasons why that is not the case. For one, Google allows for a better prompt as a user. Gemini Ultra supports one million tokens. With Claude 3, the maximum capacity of the context windows is 200,000 tokens by default. The capacity of this context window is important to get a good result from the LLM. A written question can thus contain more details and nuances, while a video input can consist of a longer fragment.

Anthropic does say that it is theoretically possible for Claude 3 to process prompts from one million tokens. However, it only makes these capabilities available to companies that need them for “specific use cases.” The price for one million tokens input is $15; for one million tokens output, the company charges $75.

No video or audio

It is also notable that Anthropic did not choose to make Claude 3 a multimodal system. Claude 3 can process different types of visual input but can only generate text or code as output. Moreover, the user cannot process video or audio in the prompt for Claude 3.

Google has already shown that it could win the battle of OpenAI on performance tests. Moreover, it has more services to incorporate and offer its LLM to the general public. These include its browser and the Android mobile operating system. Gemini’s strong performance, coupled with the great potential to distribute the model, gives Google a great chance to succeed with Gemini. Google immediately announced that the Claude 3 models will become available in Vertex AI.

Perhaps the company will then keep Anthropic in reserve if future AI development within Google falters. In all likelihood, however, Google will hope to keep Anthropic only as a plan B because, in terms of competitiveness, it is more interesting to have its own product than to have to share an LLM with Amazon. In addition, Anthropic is interesting for expanding and diversifying Google’s Vertex AI offering, which is aimed at companies.

Al competitors lurking

Moreover, Google Deepmind is already releasing Gemini 1.5 soon. That model will have a new architecture consisting of all small specialized models merged into a single model. For a query, it is also possible to put a submodel with less computational power to work. As a result, this model achieves better but, more importantly, more efficient performance. However, it is not yet clear how the capabilities of this model will work out against Claude 3. Gemini 1.5 is currently only available in preview and was not available for comparison by Anthropic.

For Anthropic, the launch of Claude 3 is already a confirmation to its investors that Anthropic developers can deliver models at the level of OpenAI. For OpenAI, another challenger to the GPT-4 model has been added. Recently, the number of competitors for GPT-4 has increased sharply. So the AI company may well show some strength to justify that most of the public’s attention still goes to OpenAI. The voice feature on ChatGPT’s smartphone app is only a tentative attempt to salvage something. At least with OpenAI’s lead investor, Microsoft, blind faith in OpenAI no longer seems to be an issue. Last week, Microsoft decided to take European Mistral AI under its wing as a second partner.

Read also: LLM for Europe: Mistral AI puts Europe on the AI map