Claude Sonnet 4.5 can code autonomously for 30 hours

Anthropic claims that Claude Sonnet 4.5 is the best code model in the world. It also comes with significant improvements in reasoning and mathematical skills.

On OSWorld, a benchmark for AI models that perform real-world computing tasks, Sonnet 4.5 leads with 61.4 percent. Four months ago, Sonnet 4 scored 42.2 percent on this test.

Along with the model, Anthropics is also introducing the Claude Agent SDK. This infrastructure, which also underpins Claude Code, is now being made available to developers. The company has spent six months addressing challenges related to memory, access rights, and coordination among subagents.

Coding performance

Claude Sonnet 4.5 scores highest on SWE-bench Verified, an evaluation that measures real-world software development skills. Here, the percentage is 77.2, compared to 74.5 percent for Opus 4.1 and GPT-5 Codex.

According to Anthropic, the model can remain focused on complex, multi-step tasks for more than 30 hours. This is a significant improvement over previous versions. As a result, Claude Sonnet 4.5 can code autonomously for 30 hours.

“It’s the strongest model for building complex agents. It’s the best model at using computers,” Anthropic said in the announcement. The company emphasizes that code is ubiquitous in modern applications, spreadsheets, and software tools.

Availability and pricing

Claude Sonnet 4.5 is available today via the Claude API under the name claude-sonnet-4-5. Pricing remains the same as Claude Sonnet 4: $3 per million input tokens and $15 per million output tokens.

In addition to the standard features, Anthropic has also added checkpoints to Claude Code, one of the most requested features. Users can now save their progress and instantly return to a previous state. A native VS Code extension is also available.

Safety and alignment

Claude Sonnet 4.5 is presented as the most aligned frontier model Anthropic has ever released. It shows significant improvements in reducing problematic behavior such as flattery, deception, and power-seeking behavior.

The model falls under Anthropic’s AI Safety Level 3 (ASL-3) protections. These include filters that detect potentially dangerous inputs and outputs, particularly those related to chemical, biological, radiological, and nuclear (CBRN) weapons. Anthropic has reduced the number of false positives by a factor of ten since the original implementation.

Anthropic also offers a temporary research preview called “Imagine with Claude.” In this experiment, Claude generates real-time software without pre-written code. The feature is available to Max users for five days.

Tip: Anthropic blocks misuse of Claude for cybercrime

How important is data analytics in cycling?

In the high-stakes world of professional cycling, marginal gains can spell the difference between victory and...

Berry Zwets September 19, 2025

DeepSeek launches V3.2-Exp with breakthrough in sparse attention

DeepSeek introduces its experimental V3.2-Exp model with sparse attention technology. The innovation promises...

Berry Zwets 21 hours ago

Top story

Qualcomm’s vision: you’re the maestro, AI is your ensemble

The most personal technology ever

Coen van Eenbergen 3 days ago

Databricks and OpenAI collaborate on enterprise AI models

The collaboration between Databricks and OpenAI makes GPT-5 available to 20,000+ enterprise customers. Dat...

Berry Zwets September 25, 2025

Expert Talks

Tech calendar

Save the Data

October 1, 2025 Kasteel Woerden

Whitepapers

Enhance your data protection strategy for 2025

The Data Protection Guide 2025 explores the essential strategies and...

Claude Sonnet 4.5 can code autonomously for 30 hours

Coding performance

Availability and pricing

Safety and alignment

Stay tuned, subscribe!

EU investigates SAP maintenance and support practices

Qualcomm’s vision: you’re the maestro, AI is your ensemble

Okta weaves AI agents deep into the identity fabric

Oracle Database @ AWS: best of both worlds?

Nutanix CTO explains their VMware alternative and multi-cloud strategy

How VMware VCF 9 and Tanzu simplify enterprise automation

ServiceNow goes after the mid-market with its AI-based Core Business Suite

The AI productivity mirage: why leaders are aiming at the wrong target

Meeting future workload demands: the case for emerging memory technologies

How AI and automation are redefining ROI in the enterprise

Enhancing video encoding: The AV1 support in the new ARTPEC-9 System-on-Chip

Save the Data

National 6G Conference

Innovation Week 2025

Luxembourg Venture Days

Dell Technologies Forum

BrickCon The Databricks Community Conference

Experience Synology’s latest enterprise backup solution

How to choose the right Enterprise Linux platform?

Enhance your data protection strategy for 2025

Strengthen your cybersecurity with DNS best practices