With DeepSeek, OpenAI, and Google already making new LLMs available in 2025, Anthropic seems like a latecomer. However, the company’s philosophy demands greater caution and therefore a little patience. Its latest Claude 3.7 Sonnet can reason at will, while Claude Code is a command-line portal to “agentic coding.”
Anthropic is swimming against the tide with Claude 3.7 Sonnet. And it’s not just with regard to the later launch cadence versus the competition; the name alone suggests an evolutionary step from 3.5 Sonnet, too. In addition, a larger and more capable Claude 3.7 Opus might be expected (and a more compact Claude 3.7 Haiku), although this was missing from the 3.5 release. This choice differs from the OpenAI and Google offerings, which provide a variety of models with varying levels of abilities. However, Anthropic adopts a philosophy where a single LLM can switch between quick answers and elaborate reasoning.
Paying users can now choose how long Claude is allowed to think. Extended thinking mode leads to the chatbot self-reflecting within the output, similar to the behavior we see with OpenAI o1, DeepSeek R1 and Google Gemini Thinking. The upside: there is a fine-grained choice of exactly how long Claude is allowed to think, from almost no time at all to the entire output token limit of 128,000.
For actual use
On benchmarks, 3.7 Sonnet is a near-match of OpenAI o1, still the most expensive and powerful AI model on the market. Depending on the test listed, OpenAI o3-mini, DeepSeek R1 and Grok 3 Beta sit near or surpass the new Anthropic LLM. Nevertheless, the numbers don’t tell the whole story, a fact Anthropic is well aware of.
Anthropic has also looked closely at Claude’s actual usage and adjusted for it in the new model’s training. Through an Economic Index, it has estimated which professions actually benefit from Claude’s capabilities. Based on this data, which we discussed in detail earlier, Claude 3.7 Sonnet has been refined to excel where it matters.
It has long been the case that the subjective experience of using Claude surpasses its placement in benchmarks versus other models. In other words, the quality of Anthropic’s LLMs are just hard to capture with a standardized test.
Claude Code
One area in which Claude continuously outperforms the benchmarks is coding. Accordingly, Anthropic points out that Sonnet is the preferred LLM among developers worldwide. Now, in a limited research preview form, Claude Code has been made available, which is even more capable as a programming tool.
Claude Code is Anthropic’s first iteration of an agentic coding tool. It can look up and check programming code, modify files, write and run tests and push code to GitHub. Importantly, it is transparent about its inner workings, so users can apply the brakes to Claude Code where necessary. Early tests show that Claude Code completes tasks in a single pass that would otherwise take 45 minutes.
Registration is required for this test version of Claude Code.
One model to rule them all
The release of Claude 3.7 Sonnet sounds like Anthropic’s late entry into the battle of reasoning models. OpenAI o1 preview was available all the way back in September, for example, while every major AI model builder except Meta and Mistral now enables some form of reasoning with a new release, albeit some only through experimental versions.
Anthropic CEO Dario Amodei, however, has dismissed this black-and-white contrast between models that reason versus ones that do not. He argues that reasoning can emerge gradually within an LLM and cannot be conjured up like it’s an on/off switch. For example, Claude 3.5’s Sonnet already showed signs of the same reasoning steps that other specific “reasoning” models performed. The new Claude release shows this more clearly than ever by customising the reasoning function by output. Thus, we need not expect a Claude Thinking.
As it happens, this is also what OpenAI is reportedly considering with GPT-5. Although the company may be switching between different models depending on user input under the hood, CEO Sam Altman wants to get rid of the oft-maligned model picker. OpenAI instead wants to detect whether a query actually requires these pricey reasoning tokens in the first place, which may lighten the load on OpenAI’s compute.
Along the same lines, it makes sense for Anthropic to put its reasoning behind a paywall. AI reasoning steps are simply expensive to compute and make sense to offset with subscription revenue. However, Anthropic argues that this reasoning process should be adjusted at will and on-the-fly, which makes the free tier at least appear somewhat hobbled. Ultimately, we can still point to benchmarks to see an improvement, meaning 3.5 Sonnet has found its successor at last.
Read also: Anthropic raises bar for GPT-5 with ‘Artifacts’ feature