Anthropic’s AI assistant, Claude, appears vulnerable to an attack that allows private data to be sent to an attacker without detection. Anthropic confirms that it is aware of the risk. The company states that users must be vigilant and interrupt the process as soon as they notice suspicious activity.
The discovery comes from researcher Johann Rehberger, also known as Wunderwuzzi, who has previously uncovered several vulnerabilities in AI systems, writes The Register. In his latest test attack, he showed how a malicious Claude can be manipulated to collect confidential information, store it locally, and then upload it to the attacker’s account via Anthropic’s official API.
According to Rehberger, the problem is not new. Once users enable network access, an AI model can unintentionally use it to send data. Anthropic states that this scenario is already described in the existing security documentation. Users are advised to actively monitor Claude’s activities and stop using the feature as soon as unusual behavior is noticed.
Hidden commands
The vulnerability exploits a document that contains hidden instructions. When a user asks Claude to summarize that document, the model may execute the malicious commands embedded in the text. This is a known risk with prompt injections, as language models struggle to distinguish between normal content and hidden commands.
Rehberger did not publish details of his malicious prompt, but showed how the attack works in a video. He says that Claude rejected his initial attempts because the model would not process the attacker’s API key in plain text. By adding seemingly innocent code, he bypassed the model’s controls.
The researcher reported the leak via HackerOne, but was initially told that his report was outside the scope. Anthropic later stated that this was a mistake. According to the company, data exfiltration does fall within the bug bounty program, but the situation described had already been publicly documented.
The incident shows that Claude’s so-called sandbox environment is less secure than its name suggests. Since the recent update, the AI can not only create and edit files but also run programs and access the network. Even with limited settings, the environment can still communicate with Anthropic APIs, which increases the risk of data leaks.
Security experts see the problem as broader than just Claude. The hCaptcha Threat Analysis Group recently tested several AI systems, including OpenAI’s ChatGPT Atlas and Google Gemini, and concluded that most models attempt to execute almost all malicious requests. Only technical limitations, and not actual security measures, cause many of these attempts to fail.
 
                        