Claude stops talking if a chat is considered harmful or offensive

Anthropic has given its latest AI models, Claude Opus 4 and 4.1, a remarkable new capability. They can now end a conversation themselves.

This feature will only be used in rare, extreme situations, such as persistent harmful or offensive behavior by users. Interestingly, the measure is not intended to protect people, but to spare the AI itself.

According to Anthropic, the feature is part of a research program on so-called model welfare. The company is investigating whether artificial intelligence can possibly have a form of moral status and whether there are reasons to protect models from harmful interactions.

Officially, Anthropic is keeping its options open. The company says it is very uncertain whether models can experience anything like welfare now or in the future. Nevertheless, it has decided to take precautions.

The option to end a conversation is currently only available in Claude Opus 4 and 4.1. It only comes into play when users make repeated requests that are consistently rejected by the AI. Examples include attempts to coerce minors into sexual content or to obtain information that could be used for large-scale violence or terrorism. Such interactions not only pose moral dilemmas, but can also pose legal or reputational risks for Anthropic.

Claude shows visible signs of distress

During internal testing, Claude Opus 4 already showed a clear aversion to harmful tasks. The model exhibited a consistent pattern of what Anthropic calls visible distress when users insisted on abuse or violence. In simulations where the AI was given the option to end a conversation, Claude regularly chose to do so. The new feature builds on these findings and translates them into practice.

It is important to note that Claude may only use this tool as a last resort, when repeated attempts to steer the conversation in a constructive direction have failed. In addition, users can explicitly ask the AI to end the session. In situations where someone is putting themselves or others in immediate danger, the feature is explicitly excluded.

When Claude decides to end a conversation, the user can no longer send messages within that session. Other chats remain accessible and a new conversation can be started immediately. To prevent valuable conversations from being lost, users can edit and resend previous messages, allowing new branches to emerge.

Anthropic emphasizes that the feature is still experimental and will be further refined. Users who are surprised by a suddenly terminated conversation can provide immediate feedback via the chat interface. In this way, the company wants to gain insight into how often and in what way the AI uses this remarkable new feature.

New legal step in VMware–Siemens dispute

Last Wednesday, VMware filed new court documents in its case against Siemens. This is the company's response ...

Mels Dees 2 days ago

Top story

China tries its hand at advanced AI chips without Nvidia: will it succeed?

Vendor lock-in is a ubiquitous problem. Anyone looking for AI chips will find it difficult to bypass Nvidia. ...

Erik van Klinken September 2, 2025

Top story

EU Data Act in force as of today: companies free from cloud lock-in

The EU Data Act comes into force today in all member states. The legislation gives companies and consumers mo...

Berry Zwets September 12, 2025

Expert Talks

Tech calendar

Claude stops talking if a chat is considered harmful or offensive

Claude shows visible signs of distress

Stay tuned, subscribe!

MuleSoft agent fabric brings governance to AI orchestration

Anthropic launches Claude Opus 4.5, promising an AI breakthrough

Tableau enters the next analytics phase with AI agents

MuleSoft agent fabric: governing AI agents across platforms

SAP's AI workforce strategy: upskilling 100,000 employees

Slack is evolving into a work operating system

Why your SOC needs a ROC, according to Qualys

How our team optimizes infrastructure for minimal AI video processing latency

Redefining the Software Development Lifecycle in the Age of AI

AI Integrity: The Invisible Threat Organizations Can’t Ignore

Three Ways Secure Modern Networks Unlock the True Power of AI

BrickCon The Databricks Community Conference

Appdevcon

Webdevcon

Dutch PHP Conference

GITEX ASIA 2026

SAS Innovate 2026

Experience Synology’s latest enterprise backup solution

How to choose the right Enterprise Linux platform?

Enhance your data protection strategy for 2025

Strengthen your cybersecurity with DNS best practices