2 min Security

Cloudflare: Attackers are deceiving AI models with prompt injection

Cloudflare: Attackers are deceiving AI models with prompt injection

Researchers at Cloudflare have found that attackers are increasingly using prompt injection to manipulate AI models. In an analysis of seven models, the Cloudforce One team examined how these systems reason and identified their vulnerabilities. 

The results show that cybercriminals can influence AI decision-making with relatively simple means, particularly in security contexts. A key part of the findings is the use of so-called lures: small text fragments designed to convince models that certain code is safe.

According to the study, these fragments can subtly steer an AI system’s judgment. When such comments account for less than one percent of a file, detection effectiveness nearly halves. The models do not appear to recognize these signals as suspicious, but they are influenced by them.

The researchers also describe a pattern in which deception effectiveness does not increase linearly. A limited amount of manipulative text often proves successful, but as soon as the amount increases significantly, models sound the alarm.

With large numbers of repetitive comments, systems recognize the pattern as anomalous and flag the code more frequently as malicious. This suggests that AI models are susceptible to both subtle and exaggerated forms of influence, but process them in different ways.

Large codebases complicate detection

Another point of concern is the role of context. It is not the language itself, but the way information is presented that proves decisive. By hiding malicious instructions within large software bundles, such as commonly used libraries, researchers were able to drastically lower the detection rate. In some cases, the detection rate for malicious code dropped to just twelve percent, because the model could not effectively distribute its attention across the entire context.

It is also striking that the models studied exhibit certain biases. For example, comments in some languages were flagged as suspicious more quickly than others, regardless of the actual content. This suggests that AI systems develop implicit assumptions based on their training data, leading to both false positives and missed threats.

According to the researchers, the report emphasizes that the security of AI systems is not only about expanding functionality or improving detection capabilities, but also about understanding and defining how models arrive at decisions. They note that even advanced models remain vulnerable to manipulation of their reasoning processes, especially when that manipulation is subtle and context-dependent.