Anthropic has stopped hackers who attempted to abuse the Claude AI system. The attackers aimed to craft phishing emails, develop malicious code, and circumvent security filters.
Anthropic’s findings, published in a report, underscore growing concerns that AI tools are increasingly being exploited for cybercrime. Anthropic’s report describes how its internal systems stopped the attacks and that it is sharing the case studies to help others understand the risks. The researchers discovered attempts to use Claude to draft targeted phishing emails, write or modify malicious code, and bypass security measures by repeatedly posing questions.
In addition, the report describes attempts to set up influence campaigns by generating persuasive messages on a large scale and helping low-skilled hackers with step-by-step instructions.
The company has not published any technical indicators such as IP addresses or specific prompts. However, the accounts involved have been banned and filters have been tightened after detecting the activity.
Industry under pressure
Anthropics is in good company. Microsoft, SoftBank-backed OpenAI, and Google have faced similar controversy over concerns that their AI models could be exploited for hacking or fraud. This has led to calls for stronger security measures.
Governments are also moving toward regulating the technology. The European Union is moving forward with the Artificial Intelligence Act, and the United States is pushing for voluntary safety commitments from major developers.
Anthropic states that it follows strict security practices, including regular testing and external reviews. The company plans to continue publishing reports when it discovers major threats.