Anthropic, a prominent artificial intelligence firm, has implemented a novel feature in specific Claude models. This new capability allows the AI to terminate a conversation under very specific and limited circumstances.
This function is exclusively available in Claude Opus 4 and 4.1, and it’s intended as a final measure when repeated attempts to steer the discussion in a different direction have been unsuccessful, or when the user explicitly requests the conversation to stop.
In a public statement issued on August 15, Anthropic clarified that the primary purpose of this feature is not user protection, but rather safeguarding the AI model itself.
Did you know?
Subscribe – We publish new crypto explainer videos every week!
How to Trade NFTs Safely? (Animated Explainer For Beginners)
Anthropic acknowledges the “high degree of uncertainty” surrounding the potential moral standing of Claude and similar Large Language Models (LLMs), both currently and in the future. Nonetheless, the company has launched a “model welfare” initiative, experimenting with minimal interventions to address potential ethical concerns as they arise.
The AI firm emphasizes that only the most extreme scenarios will trigger this new shutdown capability. These instances include user requests aimed at acquiring information for plotting acts of mass violence or terrorism.
Anthropic revealed that during internal testing, Claude Opus 4 demonstrated resistance to answering prompts of this nature. Furthermore, the company observed what it termed a “pattern of apparent distress” when the AI was compelled to respond.
Anthropic states that the process should always commence with attempts to redirect the conversation. Only if those attempts fail should the model terminate the chat. The company also emphasizes that Claude should not end the conversation if a user presents an immediate risk of self-harm or harm to others.
On August 13th, Google’s AI assistant, Gemini, received an update. Curious about the details? Read the complete report.
