3 min Devops

Debate over Claude’s performance heats up among Developers

Debate over Claude’s performance heats up among Developers

The debate surrounding the performance of Claude, Anthropic’s AI model, is intensifying. Developers and experienced users report that Claude Opus 4.6 and the Claude Code programming environment, in particular, are functioning less consistently than before.

According to VentureBeat, these are not just minor deviations, but changes that are noticeable in daily use. On online platforms, there are growing reports that Claude crashes more frequently during complex tasks, draws conclusions more quickly without thorough analysis, and handles longer workflows less stably. It is also noted that the model consumes more tokens without producing better output. For users who utilize the system for software development, this represents a clear deterioration.

A key signal came from Stella Laurenzo, Sales Director at AMD. She analyzed thousands of sessions and concluded that the model has been reasoning less deeply since the start of this year. According to her, the behavior is shifting toward faster, superficial solutions, whereas thorough analysis is essential for engineering tasks. She states that this trend is visible in large amounts of usage data.

The criticism gained additional weight because this analysis was widely shared and supplemented with other experiences. At the same time, benchmark results indicated a decline in performance. Some users saw this as confirmation that something fundamental has changed.

Doubts about the comparability of tests

Not everyone interprets those figures that way. Researcher Paul Calcraft emphasizes on X that the benchmarks differ in substance. According to him, the results are based on diverse test sets and are therefore difficult to compare. In overlapping cases, he sees only limited deviations, which nuances the picture of a sharp decline.

Within Anthropic, the criticism is interpreted differently. According to Boris Cherny, who is responsible for Claude Code, recent changes are primarily the result of product choices. He argues that interface adjustments, such as making thought processes less visible, do not affect the model’s operation. He also points to modified default settings that determine how much computational power Claude allocates per task, intended to strike a better balance between speed, cost, and performance.

Thariq Shihipar, a member of the Claude Code team, also responded to the criticism. He states that the company does not weaken models to handle peak loads. However, he acknowledges that changes in presentation affect how users perceive performance.

The context of recent policy changes plays a role. Anthropic previously indicated that usage limits are enforced more strictly during peak times. Although this is unrelated to model quality, it fuels the idea that more is changing behind the scenes.

Additionally, there is a discussion about changes to caching within Claude Code. Some users have noticed that saved context expires more quickly, leading to higher costs and faster quota consumption. Anthropic confirms these adjustments but states that they are part of optimizations.

The core of the discussion thus also touches on trust. For developers who rely on Claude daily, small changes can lead to less predictable results. At the same time, Anthropic emphasizes that the model’s foundation has not been altered in a way that reduces quality.