Claude 4 is a faster, more accurate programmer

AI coding tool Lovable sees major improvements in the code produced by Claude. Claude Opus 4 and Sonnet 4 make fewer mistakes, work faster, and are therefore more useful programming assistants.

Claude creator Anthropic recently rolled out the two new models. Claude 4 is the long-awaited successor to the successful 3.5/3.7 Sonnet. The latter LLM was already a favorite among programmers, although (like all other models) it had many shortcomings.

Impressive benchmarks

In a blog post, Anthropic stated that Claude Opus 4 achieved a score of 72.5 percent on SWE-bench (Software Engineering Benchmark). This benchmark is designed to test the software engineering capabilities of AI models. As a result, it is multifaceted: LLMs must first try to solve GitHub issues that need to be understood. Only then can they write code.

The tests show that Opus 4 performs excellently on long-running tasks that require sustained focus and thousands of steps. According to Anthropic, the latest model was even able to work on code for seven hours straight without any loss of quality. That’s a big claim: LLMs have a notorious tendency to focus on the initial input, after which the output gets worse and worse.

The company is therefore positioning the new generation of models as a breakthrough in coding, advanced reasoning, and autonomous AI systems. According to Anthropic, this comes with a higher risk level than ever before: for the first time, it is activating AI Safety Level 3 protections. This should prevent Claude 4 from participating in malicious tasks that it could theoretically perform.

Practical improvements at Lovable

Lovable, a developer of “AI-driven prompt-based web and app builders” (read: vibe coding), has observed similar improvements after switching to Claude 4. The company uses Claude for its own solution.

In a post on X, Lovable reports that after implementing Claude 4, it has seen a 25 percent reduction in errors and an overall speed improvement of 40 percent. These improvements apply to both creating new projects and editing existing ones.

In a separate post, Lovable founder Anton Osika confirmed that “Claude 4 has eliminated most of Lovable’s errors,” referring primarily to syntax errors in coding.

Impact for developers

The improvements in Claude 4 are significant for developers who rely on AI coding tools. In theory, that’s quite a few: anyone who wants to program something without any expertise needs to get virtually error-free answers from LLMs. Syntax errors are among the most common problems in automatic code generation, and a 25 percent reduction can significantly increase productivity.

The 40 percent speed improvement also means that developers (no matter how experienced) spend less time waiting for code to be generated or edited, leading to a more efficient development process.

Enough?

With these improvements, Anthropic demonstrates that the latest generation of LLMs not only perform better in controlled benchmark tests, but also offer concrete benefits in practical applications for software development. The question is whether this is “enough,” and if so, what would be enough? Minimizing errors is a prerequisite for scoring well on benchmarks, but the reality is that reliable, consistent code generation is still a long way off.

The ultimate goal of AI model builders is, as has been repeated ad nauseam, AGI. What this ‘Artificial General Intelligence’ should actually be able to do is unclear. Should it be as good as a human? As good as the best professionals? And in which tasks? All this seems easier to define by linking a concrete task to the AGI requirement. Think of coding, where after each compilation or Run button, it becomes clear whether the programming code makes sense. However, that is not the practical limitation at the moment. Rather, Claude 4, like its predecessors, depends on high-quality prompts and a cooperative individual to fix any bugs.

AI saves development time, but inefficiencies still cause losses

A new Atlassian study reveals a paradox. Developers may save up to 10 hours per week with AI tools, but lose ...

Berry Zwets 1 day ago

Google replaces Developer Previews with Android Canary channel

Google has announced a major change to its Android pre-release program. Instead of the existing Developer Pre...

Mels Dees 1 day ago

Top story

Is English the next programming language? JetBrains’ CEO says no

AI evangelists like Nvidia's Jensen Huang proclaim that English will become the next programming language. Je...

Erik van Klinken July 8, 2025

Tech calendar

Claude 4 is a faster, more accurate programmer

Is it enough?

Impressive benchmarks

Practical improvements at Lovable

Impact for developers

Enough?

Stay tuned, subscribe!

Is English the next programming language? JetBrains’ CEO says no

Many roads lead to Oracle: the routes taken by VTTI and Hendrix Genetics

What we know about SafePay, the Ingram Micro attackers

It’s World Backup Day, but backups alone are not enough

Enterprise Data Cloud is a logical but important evolution of the Pure platform

Veeam focuses on migration and data freedom through hypervisor updates

Krijg Volledig Inzicht van Gebruiker tot Cloud met Cisco ThousandEyes

GITEX DIGI_HEALTH 5.0 - Thailand

IT Arena

Innovation Week 2025

Luxembourg Venture Days

Appdevcon

Experience Synology’s latest enterprise backup solution

How to choose the right Enterprise Linux platform?

Enhance your data protection strategy for 2025

Strengthen your cybersecurity with DNS best practices