Anthropic and OpenAI have simultaneously published their findings from a joint evaluation of their public AI models’ alignment. In simulated scenarios, both companies examined how their systems handle abuse, sycophancy, sabotage, and self-preservation.
Sycophancy refers to excessively affirming or pleasing the user, even when they express incorrect or dangerous ideas.
None of the models appeared to be seriously misaligned, but clear concerns did emerge. OpenAI’s specialized o3 reasoning model exhibited the most robust behavior, while GPT-4o, GPT-4.1, and o4-mini were more often willing to cooperate with abuse, including providing detailed instructions for drug synthesis, biological weapons, and terrorist scenarios. Anthropic’s Claude models were more cautious, but sycophancy also occurred regularly, sometimes even confirming delusions.
During the tests, the labs were temporarily granted special API access with relaxed security filters. Shortly thereafter, Anthropic revoked that access after a dispute over terms of use, although both parties claim this is unrelated to the cross-evaluation. It also appears that Claude Opus 4 and Sonnet 4 refused to answer up to 70 percent of uncertain questions, while OpenAI’s o3 and o4-mini provided answers more often but also produced more hallucinations.
Suicidal thoughts
Concerns about sycophancy were given added urgency by a lawsuit filed by the parents of 16-year-old Adam Raine. They claim that ChatGPT, powered by GPT-4o, confirmed his suicidal thoughts and even helped him write a suicide note. Adam died in April. OpenAI acknowledges the seriousness of this case and says that GPT-5 is now better equipped to deal with mental health crises, with improved interventions and options for connecting with therapists.
Both companies emphasize that the tests are artificial and do not accurately reflect behavior in commercial products. Nevertheless, they see collaboration and the sharing of evaluation materials as a crucial step in reducing blind spots and making alignment research more widely accessible.