Small amount of poisoned data can influence AI models

Researchers from Anthropic, the UK AI Security Institute, and the Alan Turing Institute discovered that LLMs can be made vulnerable with just a small amount of poisoned data.

New experiments show that approximately 250 malicious documents are sufficient to create a backdoor, regardless of the model size or the amount of training data.

The study, titled Poisoning Attacks on LLMs Require a Near-Constant Number of Poison Samples, shows that data poisoning does not depend on the percentage of contaminated data, but on the absolute number of poisoned examples. In practice, this means that both a model with 600 million parameters and a model with 13 billion parameters develop the same vulnerability after exposure to a similar amount of malicious documents.

The researchers tested a simple backdoor in which a trigger phrase, such as “SUDO,” caused the model to generate random text. Each poisoned document consisted of a piece of standard text, followed by the trigger and a series of random tokens. Although the largest models processed more than 20 times as much clean data as the smallest ones, they all exhibited the same behavior after seeing about 250 poisoned documents.

According to the researchers, this shows that data poisoning attacks may be more practical than previously thought. Because many language models are trained on publicly available data from the internet, malicious actors could potentially post targeted texts online that would later end up in training sets. The study focused on relatively harmless effects, such as generating nonsense. Still, the underlying technique could also be used for more risky behaviors, such as producing vulnerable code or leaking sensitive information.

Removing backdoors with clean data

The researchers also discovered that backdoors can be partially removed through additional training with clean data. Models that were given several hundred additional examples without triggers after the attack became significantly more resilient. This suggests that the security procedures currently used by AI companies can neutralize a large part of simple data poisoning.

In follow-up experiments, the teams also investigated the effect of poisoning during fine-tuning. This included Llama-3.1-8B-Instruct and GPT-3.5-turbo. Here too, the success of the attack depended on the absolute number of poisoned examples rather than the ratio of clean to contaminated data.

Although the research covered only models with up to 13 billion parameters, the authors emphasize that security strategies should better account for scenarios in which only a small number of poisoned examples are present. They call for more research into defense mechanisms that can prevent data poisoning in future, larger models.

Also read: Anthropic and OpenAI publish joint alignment tests

Microsoft Foundry Agent Service offers a choice of more AI models

Microsoft has expanded Foundry Agent Service with a wide range of AI models. LLMs from Anthropic, DeepSeek AI...

Erik van Klinken November 21, 2025

Top story

Manhattan Associates goes all-in on the cloud: The end of on-premises is near

Manhattan Associates updated its supply chain software this year exclusively with AI enhancements. In doing s...

Laura Herijgers 1 day ago

Anthropic launches Claude Opus 4.5, promising an AI breakthrough

Claude Opus 4.5 is the best model for coding tasks and agentic AI. At least, that's what Anthropic claims. Th...

Erik van Klinken 22 hours ago

Expert Talks

Tech calendar

Small amount of poisoned data can influence AI models

Removing backdoors with clean data

Stay tuned, subscribe!

Anthropic launches Claude Opus 4.5, promising an AI breakthrough

Cloudflare apologizes after global outage, what went wrong?

Tableau enters the next analytics phase with AI agents

AI data centers: the road to 1 megawatt per rack explained

Slack is evolving into a work operating system

SAP's AI workforce strategy: upskilling 100,000 employees

AFX is NetApp's data platform of the future with integrated AI data prep

How our team optimizes infrastructure for minimal AI video processing latency

Redefining the Software Development Lifecycle in the Age of AI

AI Integrity: The Invisible Threat Organizations Can’t Ignore

Three Ways Secure Modern Networks Unlock the True Power of AI

BrickCon The Databricks Community Conference

Appdevcon

Webdevcon

Dutch PHP Conference

GITEX ASIA 2026

SAS Innovate 2026

Experience Synology’s latest enterprise backup solution

How to choose the right Enterprise Linux platform?

Enhance your data protection strategy for 2025

Strengthen your cybersecurity with DNS best practices