Anthropic launches Claude Opus 4.5, promising an AI breakthrough

Claude Opus 4.5 is the best model for coding tasks and agentic AI. At least, that’s what Anthropic claims. The new model is “a step forward in what AI systems can do” and could even give us a glimpse into the future of work. How so?

Claude Opus 4.5 is the big brother of the previously launched Sonnet 4.5, which was already the best model in several ways when it appeared at the end of September. Google Gemini 3 Pro has since been released and OpenAI has refined its GPT-5 to GPT-5.1, so the competition remains in flux. Despite the fact that Gemini 3 Pro and GPT-5.1-Codex-Max only came close to the coding performance of Sonnet 4.5, Anthropic deemed it necessary to make Opus 4.5 a significantly better software engineer.

In a somewhat misleading bar chart (the bars start at 70 percent and the yardstick stops at 82 percent), Opus 4.5 is clearly a step ahead of the competition. Using the widely embraced SWE Bench Verified test, Opus 4.5 scored 80.9 percent, significantly better than Sonnet 4.5 (77.2 percent), Codex-Max (77.9 percent), and Gemini 3 Pro (76.2 percent).

Record score on encoding benchmarks

The new model is available immediately in all Claude apps, via the API, and on all three major cloud platforms: Azure, GCP, and AWS. At the same time, Anthropic is lowering the prices for the Claude API. The new AI model costs $5 per million input tokens and $25 per million output tokens. This makes Opus models a more realistic option than before, as Anthropic was often on the expensive side in terms of pricing.

More efficient than its predecessors

In addition to better performance, Anthropic has also made the model more efficient. Claude Opus 4.5 uses significantly fewer tokens than its predecessors, including Opus 4.1, to achieve the same or even better results. The model does less backtracking, redundant exploration, and verbal reasoning.

For example, Opus 4.5 on Medium reasoning effort beats the aforementioned SWE benchmark Verified scores of Sonnet 4.5 with 76 percent fewer output tokens. On High reasoning effort, Opus 4.5 performs 4.3 percent better than Sonnet 4.5, while using 48 percent fewer tokens.

More control for developers

Anthropics is following OpenAI’s example by adding a reasoning effort parameter to the Claude API. This allows developers to determine for themselves where the balance lies between speed and thinking ability. It seems that this is attracting the attention of AI specialists today, whereas previously, for example, the training data, training method, and distribution of information among “experts” within an LLM received the most explanation when new models were announced.

With Opus 4.5, Claude Code can now make more accurate plans and execute them more thoroughly. It can also ask clarifying questions in advance and then create an editable plan.md file before getting started. This approach should lead to better results for complex coding tasks.

The battle reignites

Whether Opus 4.5 is indeed the best code model in the world remains to be seen in practice. Benchmarks provide an indication, but real user experiences with complex projects are ultimately decisive. Consider Meta’s Llama models, which are often close to the best AI models on the market when you look purely at scores, but are generally not a favorite among actual users. In that respect, the Claude series has a reputation to uphold, in which it is regularly embraced for AI tasks more than the benchmarks alone would suggest. Now, those benchmarks are excellent for specific tasks.

Whether this really offers a glimpse of the future remains to be seen. Anthropic claims that Opus 4.5 handles ambiguity well and often produces strong results without guidance. According to early users of the new model, tasks that were previously virtually impossible are now achievable. “In general terms, our testers said: Opus 4.5 just gets it.”

Anthropic launches Claude Opus 4.5, promising an AI breakthrough

Record score on encoding benchmarks

More efficient than its predecessors

More control for developers

The battle reignites

Stay tuned, subscribe!

RAG ‘2.0’: the Instructed Retriever links AI agents to the right data

‘Ni8mare’ vulnerability affects n8n platform with a score of 10.0

Arm and Nvidia are on the prowl for physical AI’s ‘ChatGPT moment’

Why 90% of Salesforce Agentforce deployments start with service

SAP Business Network: $6.5 trillion B2B collaboration platform

MuleSoft agent fabric: governing AI agents across platforms

SAP's AI migration tools from ECC to S/4HANA: faster and cheaper ERP transitions

Cybersecurity in 2026 demands managing human behavior and agentic AI

Digital sovereignty: from buzzword to business imperative

Why specialized LLMs are the future of generative AI

The year of the AI agents: why 2026 is about value, not technology

Appdevcon

Webdevcon

Dutch PHP Conference

De IT Afdeling van de toekomst

GITEX ASIA 2026

SAS Innovate 2026

Experience Synology’s latest enterprise backup solution

How to choose the right Enterprise Linux platform?

Enhance your data protection strategy for 2025

Strengthen your cybersecurity with DNS best practices