Cloudflare: Attackers are deceiving AI models with prompt injection

Researchers at Cloudflare have found that attackers are increasingly using prompt injection to manipulate AI models. In an analysis of seven models, the Cloudforce One team examined how these systems reason and identified their vulnerabilities.

The results show that cybercriminals can influence AI decision-making with relatively simple means, particularly in security contexts. A key part of the findings is the use of so-called lures: small text fragments designed to convince models that certain code is safe.

According to the study, these fragments can subtly steer an AI system’s judgment. When such comments account for less than one percent of a file, detection effectiveness nearly halves. The models do not appear to recognize these signals as suspicious, but they are influenced by them.

The researchers also describe a pattern in which deception effectiveness does not increase linearly. A limited amount of manipulative text often proves successful, but as soon as the amount increases significantly, models sound the alarm.

With large numbers of repetitive comments, systems recognize the pattern as anomalous and flag the code more frequently as malicious. This suggests that AI models are susceptible to both subtle and exaggerated forms of influence, but process them in different ways.

Large codebases complicate detection

Another point of concern is the role of context. It is not the language itself, but the way information is presented that proves decisive. By hiding malicious instructions within large software bundles, such as commonly used libraries, researchers were able to drastically lower the detection rate. In some cases, the detection rate for malicious code dropped to just twelve percent, because the model could not effectively distribute its attention across the entire context.

It is also striking that the models studied exhibit certain biases. For example, comments in some languages were flagged as suspicious more quickly than others, regardless of the actual content. This suggests that AI systems develop implicit assumptions based on their training data, leading to both false positives and missed threats.

According to the researchers, the report emphasizes that the security of AI systems is not only about expanding functionality or improving detection capabilities, but also about understanding and defining how models arrive at decisions. They note that even advanced models remain vulnerable to manipulation of their reasoning processes, especially when that manipulation is subtle and context-dependent.

Cloudflare: Attackers are deceiving AI models with prompt injection

Large codebases complicate detection

Stay tuned, subscribe!

Google Gemini Enterprise to become the AI platform for everyone

“MCP is just an API,” and that is precisely the problem with Gemini Enterprise

The SaaSpocalypse is a myth, and Salesforce proves it

Containerized data centers help avoid many pitfalls in AI deployments

How Falco catches threats that static analysis misses

Cisco doubled down on compute for the AI and edge era

How to migrate from Redis to Valkey with zero downtime

groundcover uses EBPF and AI agents to modernize observability

The only thing constant in technology is change, except for unrealistic hopefulness

mnemonic opens Dutch Security Operations Centre (SOC) and relocates to new office in Utrecht

AI governance: the invisible prerequisite for success

Agentic AI is reshaping the network – and it’s time to upgrade

Team '26

Knowledge 26

GISEC GLOBAL 2026

Red Hat Summit

DevOpsCon London

Digitale soevereiniteit in de boardroom

Experience Synology’s latest enterprise backup solution

How to choose the right Enterprise Linux platform?

Enhance your data protection strategy for 2025

Strengthen your cybersecurity with DNS best practices