2 min Analytics

Claude Sonnet 4.5 can code autonomously for 30 hours

Claude Sonnet 4.5 can code autonomously for 30 hours

Anthropic claims that Claude Sonnet 4.5 is the best code model in the world. It also comes with significant improvements in reasoning and mathematical skills.

On OSWorld, a benchmark for AI models that perform real-world computing tasks, Sonnet 4.5 leads with 61.4 percent. Four months ago, Sonnet 4 scored 42.2 percent on this test.

Along with the model, Anthropics is also introducing the Claude Agent SDK. This infrastructure, which also underpins Claude Code, is now being made available to developers. The company has spent six months addressing challenges related to memory, access rights, and coordination among subagents.

Coding performance

Claude Sonnet 4.5 scores highest on SWE-bench Verified, an evaluation that measures real-world software development skills. Here, the percentage is 77.2, compared to 74.5 percent for Opus 4.1 and GPT-5 Codex.

According to Anthropic, the model can remain focused on complex, multi-step tasks for more than 30 hours. This is a significant improvement over previous versions. As a result, Claude Sonnet 4.5 can code autonomously for 30 hours.

“It’s the strongest model for building complex agents. It’s the best model at using computers,” Anthropic said in the announcement. The company emphasizes that code is ubiquitous in modern applications, spreadsheets, and software tools.

Availability and pricing

Claude Sonnet 4.5 is available today via the Claude API under the name claude-sonnet-4-5. Pricing remains the same as Claude Sonnet 4: $3 per million input tokens and $15 per million output tokens.

In addition to the standard features, Anthropic has also added checkpoints to Claude Code, one of the most requested features. Users can now save their progress and instantly return to a previous state. A native VS Code extension is also available.

Safety and alignment

Claude Sonnet 4.5 is presented as the most aligned frontier model Anthropic has ever released. It shows significant improvements in reducing problematic behavior such as flattery, deception, and power-seeking behavior.

The model falls under Anthropic’s AI Safety Level 3 (ASL-3) protections. These include filters that detect potentially dangerous inputs and outputs, particularly those related to chemical, biological, radiological, and nuclear (CBRN) weapons. Anthropic has reduced the number of false positives by a factor of ten since the original implementation.

Anthropic also offers a temporary research preview called “Imagine with Claude.” In this experiment, Claude generates real-time software without pre-written code. The feature is available to Max users for five days.

Tip: Anthropic blocks misuse of Claude for cybercrime