Vulnerability in Claude enables data leak via prompt

Anthropic’s AI assistant, Claude, appears vulnerable to an attack that allows private data to be sent to an attacker without detection. Anthropic confirms that it is aware of the risk. The company states that users must be vigilant and interrupt the process as soon as they notice suspicious activity.

The discovery comes from researcher Johann Rehberger, also known as Wunderwuzzi, who has previously uncovered several vulnerabilities in AI systems, writes The Register. In his latest test attack, he showed how a malicious Claude can be manipulated to collect confidential information, store it locally, and then upload it to the attacker’s account via Anthropic’s official API.

According to Rehberger, the problem is not new. Once users enable network access, an AI model can unintentionally use it to send data. Anthropic states that this scenario is already described in the existing security documentation. Users are advised to actively monitor Claude’s activities and stop using the feature as soon as unusual behavior is noticed.

Hidden commands

The vulnerability exploits a document that contains hidden instructions. When a user asks Claude to summarize that document, the model may execute the malicious commands embedded in the text. This is a known risk with prompt injections, as language models struggle to distinguish between normal content and hidden commands.

Rehberger did not publish details of his malicious prompt, but showed how the attack works in a video. He says that Claude rejected his initial attempts because the model would not process the attacker’s API key in plain text. By adding seemingly innocent code, he bypassed the model’s controls.

The researcher reported the leak via HackerOne, but was initially told that his report was outside the scope. Anthropic later stated that this was a mistake. According to the company, data exfiltration does fall within the bug bounty program, but the situation described had already been publicly documented.

The incident shows that Claude’s so-called sandbox environment is less secure than its name suggests. Since the recent update, the AI can not only create and edit files but also run programs and access the network. Even with limited settings, the environment can still communicate with Anthropic APIs, which increases the risk of data leaks.

Security experts see the problem as broader than just Claude. The hCaptcha Threat Analysis Group recently tested several AI systems, including OpenAI’s ChatGPT Atlas and Google Gemini, and concluded that most models attempt to execute almost all malicious requests. Only technical limitations, and not actual security measures, cause many of these attempts to fail.

OpenAI aims for largest IPO ever, but it’s holding off for now

US AI player OpenAI is reportedly preparing for an IPO that could value the company at as much as $1 trillion...

Mels Dees 1 day ago

ServiceNow CEO: AI will not replace enterprise software

ServiceNow CEO Bill McDermott does not see artificial intelligence as a threat to business software, but rath...

Mels Dees 1 day ago

Top story

Atlassian CTO on realistic AI: Rovo, data privacy and adoption

Organizations don't benefit from AI yet: how can they change that?

Sander Almekinders 2 days ago

Expert Talks

Tech calendar

Vulnerability in Claude enables data leak via prompt

Hidden commands

Stay tuned, subscribe!

Nexperia intervention after theft of trade secrets by CEO

The true cost of a cloud outage

Why your SOC needs a ROC

Nvidia acquires significant stake in Nokia to accelerate 5G and 6G

Oracle Database @ AWS: best of both worlds?

SAP Business Network: $6.5 trillion B2B collaboration platform

ServiceNow goes after the mid-market with its AI-based Core Business Suite

Slack is evolving into a work operating system

Three Ways Secure Modern Networks Unlock the True Power of AI

How to Safeguard and Prepare Exchange Server against Natural Disasters?

Minimizing liability is not the same as security: Lessons learned from Collin’s Aerospace cyberattack

Discover Why Northern Europe Chooses Redgate Monitor

Dell Technologies Forum

BrickCon The Databricks Community Conference

Appdevcon

Webdevcon

Dutch PHP Conference

Experience Synology’s latest enterprise backup solution

How to choose the right Enterprise Linux platform?

Enhance your data protection strategy for 2025

Strengthen your cybersecurity with DNS best practices