Anthropic has given its latest AI models, Claude Opus 4 and 4.1, a remarkable new capability. They can now end a conversation themselves.
This feature will only be used in rare, extreme situations, such as persistent harmful or offensive behavior by users. Interestingly, the measure is not intended to protect people, but to spare the AI itself.
According to Anthropic, the feature is part of a research program on so-called model welfare. The company is investigating whether artificial intelligence can possibly have a form of moral status and whether there are reasons to protect models from harmful interactions.
Officially, Anthropic is keeping its options open. The company says it is very uncertain whether models can experience anything like welfare now or in the future. Nevertheless, it has decided to take precautions.
The option to end a conversation is currently only available in Claude Opus 4 and 4.1. It only comes into play when users make repeated requests that are consistently rejected by the AI. Examples include attempts to coerce minors into sexual content or to obtain information that could be used for large-scale violence or terrorism. Such interactions not only pose moral dilemmas, but can also pose legal or reputational risks for Anthropic.
Claude shows visible signs of distress
During internal testing, Claude Opus 4 already showed a clear aversion to harmful tasks. The model exhibited a consistent pattern of what Anthropic calls visible distress when users insisted on abuse or violence. In simulations where the AI was given the option to end a conversation, Claude regularly chose to do so. The new feature builds on these findings and translates them into practice.
It is important to note that Claude may only use this tool as a last resort, when repeated attempts to steer the conversation in a constructive direction have failed. In addition, users can explicitly ask the AI to end the session. In situations where someone is putting themselves or others in immediate danger, the feature is explicitly excluded.
When Claude decides to end a conversation, the user can no longer send messages within that session. Other chats remain accessible and a new conversation can be started immediately. To prevent valuable conversations from being lost, users can edit and resend previous messages, allowing new branches to emerge.
Anthropic emphasizes that the feature is still experimental and will be further refined. Users who are surprised by a suddenly terminated conversation can provide immediate feedback via the chat interface. In this way, the company wants to gain insight into how often and in what way the AI uses this remarkable new feature.