Claude Opus 4.5 is the best model for coding tasks and agentic AI. At least, that’s what Anthropic claims. The new model is “a step forward in what AI systems can do” and could even give us a glimpse into the future of work. How so?
Claude Opus 4.5 is the big brother of the previously launched Sonnet 4.5, which was already the best model in several ways when it appeared at the end of September. Google Gemini 3 Pro has since been released and OpenAI has refined its GPT-5 to GPT-5.1, so the competition remains in flux. Despite the fact that Gemini 3 Pro and GPT-5.1-Codex-Max only came close to the coding performance of Sonnet 4.5, Anthropic deemed it necessary to make Opus 4.5 a significantly better software engineer.
In a somewhat misleading bar chart (the bars start at 70 percent and the yardstick stops at 82 percent), Opus 4.5 is clearly a step ahead of the competition. Using the widely embraced SWE Bench Verified test, Opus 4.5 scored 80.9 percent, significantly better than Sonnet 4.5 (77.2 percent), Codex-Max (77.9 percent), and Gemini 3 Pro (76.2 percent).
Record score on encoding benchmarks
The new model is available immediately in all Claude apps, via the API, and on all three major cloud platforms: Azure, GCP, and AWS. At the same time, Anthropic is lowering the prices for the Claude API. The new AI model costs $5 per million input tokens and $25 per million output tokens. This makes Opus models a more realistic option than before, as Anthropic was often on the expensive side in terms of pricing.
More efficient than its predecessors
In addition to better performance, Anthropic has also made the model more efficient. Claude Opus 4.5 uses significantly fewer tokens than its predecessors, including Opus 4.1, to achieve the same or even better results. The model does less backtracking, redundant exploration, and verbal reasoning.
For example, Opus 4.5 on Medium reasoning effort beats the aforementioned SWE benchmark Verified scores of Sonnet 4.5 with 76 percent fewer output tokens. On High reasoning effort, Opus 4.5 performs 4.3 percent better than Sonnet 4.5, while using 48 percent fewer tokens.
More control for developers
Anthropics is following OpenAI’s example by adding a reasoning effort parameter to the Claude API. This allows developers to determine for themselves where the balance lies between speed and thinking ability. It seems that this is attracting the attention of AI specialists today, whereas previously, for example, the training data, training method, and distribution of information among “experts” within an LLM received the most explanation when new models were announced.
With Opus 4.5, Claude Code can now make more accurate plans and execute them more thoroughly. It can also ask clarifying questions in advance and then create an editable plan.md file before getting started. This approach should lead to better results for complex coding tasks.
The battle reignites
Whether Opus 4.5 is indeed the best code model in the world remains to be seen in practice. Benchmarks provide an indication, but real user experiences with complex projects are ultimately decisive. Consider Meta’s Llama models, which are often close to the best AI models on the market when you look purely at scores, but are generally not a favorite among actual users. In that respect, the Claude series has a reputation to uphold, in which it is regularly embraced for AI tasks more than the benchmarks alone would suggest. Now, those benchmarks are excellent for specific tasks.
Whether this really offers a glimpse of the future remains to be seen. Anthropic claims that Opus 4.5 handles ambiguity well and often produces strong results without guidance. According to early users of the new model, tasks that were previously virtually impossible are now achievable. “In general terms, our testers said: Opus 4.5 just gets it.”
Read also: Claude Sonnet 4.5 can code autonomously for 30 hours