3 min Security

OpenAI: the risk of prompt injection may never disappear

OpenAI: the risk of prompt injection may never disappear

OpenAI states that prompt injection will probably never disappear completely, but that a proactive and rapid response can significantly reduce the risk.

The company reported this in an explanation of its security approach to AI agents such as ChatGPT Atlas. According to OpenAI, this is a structural challenge within AI security, comparable to online fraud and social engineering, in which attackers continue to adapt to new defensive measures. The company therefore expects to continue working actively on this for years to come.

This assessment is not unique. According to TechCrunch, the UK’s National Cyber Security Centre also recently warned that prompt injection attacks on generative AI may never be completely preventable. The British cyber authority advises organizations to focus on limiting risk and impact, rather than expecting the problem to be completely solved. This positions prompt injection as a fundamental challenge for AI systems operating on the open web.

Prompt injection is an attack technique in which malicious instructions are hidden in content processed by an AI agent, such as emails or web pages. The agent may consider these instructions to be legitimate and follow them, causing its behavior to be redirected. As a result, the agent acts in the interests of the attacker rather than the user.

Additional layer of threat

For browser-based agents such as ChatGPT Atlas, this means an additional layer of threat on top of existing web security risks. The agent can independently open web pages, read emails, and perform actions. A malicious email with hidden instructions can therefore become part of a workflow without being noticed, for example when a user asks to process or summarize emails. This can lead to data leaks or other undesirable actions.

According to OpenAI, this is just one example of a broader problem. The versatility of agents also increases the attack surface. During their work, they may encounter untrustworthy input via emails, attachments, shared documents, social media, and websites. Because agents can perform many of the same actions as users themselves, the impact of a successful attack can be significant.

Other parties also recognize that prompt injection is not a temporary problem. TechCrunch points out that Brave previously stated that indirect prompt injection poses a systematic challenge for AI browsers, including solutions from competitors such as Perplexity. In addition, companies such as Anthropic and Google emphasize that defense is only possible with layered security and continuous stress testing. OpenAI shares this view, but with its automated attacker, it has opted for intensive use of reinforcement learning.

OpenAI is strongly committed to automated attack detection. The company uses a specially trained AI attacker that actively searches for new prompt injection attacks on agents in production environments using reinforcement learning. Through repeated simulations, this attacker learns to identify vulnerabilities before they are exploited in practice.