Anthropic and OpenAI publish joint alignment tests

Anthropic and OpenAI have simultaneously published their findings from a joint evaluation of their public AI models’ alignment. In simulated scenarios, both companies examined how their systems handle abuse, sycophancy, sabotage, and self-preservation.

Sycophancy refers to excessively affirming or pleasing the user, even when they express incorrect or dangerous ideas.

None of the models appeared to be seriously misaligned, but clear concerns did emerge. OpenAI’s specialized o3 reasoning model exhibited the most robust behavior, while GPT-4o, GPT-4.1, and o4-mini were more often willing to cooperate with abuse, including providing detailed instructions for drug synthesis, biological weapons, and terrorist scenarios. Anthropic’s Claude models were more cautious, but sycophancy also occurred regularly, sometimes even confirming delusions.

During the tests, the labs were temporarily granted special API access with relaxed security filters. Shortly thereafter, Anthropic revoked that access after a dispute over terms of use, although both parties claim this is unrelated to the cross-evaluation. It also appears that Claude Opus 4 and Sonnet 4 refused to answer up to 70 percent of uncertain questions, while OpenAI’s o3 and o4-mini provided answers more often but also produced more hallucinations.

Suicidal thoughts

Concerns about sycophancy were given added urgency by a lawsuit filed by the parents of 16-year-old Adam Raine. They claim that ChatGPT, powered by GPT-4o, confirmed his suicidal thoughts and even helped him write a suicide note. Adam died in April. OpenAI acknowledges the seriousness of this case and says that GPT-5 is now better equipped to deal with mental health crises, with improved interventions and options for connecting with therapists.

Both companies emphasize that the tests are artificial and do not accurately reflect behavior in commercial products. Nevertheless, they see collaboration and the sharing of evaluation materials as a crucial step in reducing blind spots and making alignment research more widely accessible.

NetSuite Next brings AI-first transformation to the ERP platform

"The next generation of computer software applications you will talk to"

Coen van Eenbergen 1 day ago

Expert Talks

Tech calendar

Anthropic and OpenAI publish joint alignment tests

Suicidal thoughts

Stay tuned, subscribe!

Why 90% of AI agent deployments start with customer service

Earnings roundup: Pure, CrowdStrike, Okta, GitLab, Box

How Corsearch migrated from Microsoft to Google and Slack

HPE expands GreenLake with virtualization, AI, and security

Salesforce reveals its own Agentic IT Service Platform

From MSP to MIP: Pax8's vision for Managed Intelligence Providers

SAP's AI workforce strategy: upskilling 100,000 employees

Qualcomm tells us how ARM chips will disrupt the enterprise PC market

How our team optimizes infrastructure for minimal AI video processing latency

Redefining the Software Development Lifecycle in the Age of AI

AI Integrity: The Invisible Threat Organizations Can’t Ignore

Three Ways Secure Modern Networks Unlock the True Power of AI

Appdevcon

Webdevcon

Dutch PHP Conference

GITEX ASIA 2026

SAS Innovate 2026

Team '26

Experience Synology’s latest enterprise backup solution

How to choose the right Enterprise Linux platform?

Enhance your data protection strategy for 2025

Strengthen your cybersecurity with DNS best practices