4 min Security

A new hack corrupts Gemini’s long-term memory

A new hack corrupts Gemini’s long-term memory

Indirect prompt injection is a fundamental technique to make chatbots perform malicious actions. Developers of platforms such as Google’s Gemini and OpenAI’s ChatGPT continuously work to plug these security holes. However, hackers keep finding new ways to bypass them.

Ars Technica reveals that researcher Johann Rehberger presented a new method to bypass Gemini’s prompt-injection defenses. Specifically, these are the measures that prevent the chatbot from invoking Google Workspace or other sensitive tools when processing untrusted data. Examples include incoming emails or shared documents.

The result of Rehberger’s attack is the permanent addition of long-term reminders that remain present in all future sessions. This opens the door for the chatbot to continuously act on false information or instructions.

What are indirect prompt injections?

Prompts within LLMs are instructions, given by the chatbot developers or the user, to perform tasks. Examples include summarizing an e-mail. Or drafting a response. But what if the content of a message contains a hidden, malicious instruction? Chatbots are so obedient that they often follow such instructions, even if they were not intended as prompts.

AI’s tendency to see instructions in everything is the basis of indirect prompt injection. It is one of the most basic techniques in AI hacking. Developers are constantly trying to solve this problem, but it remains a cat-and-mouse game.

Temporary solutions

Since there are few effective methods to fix the underlying gullibility of chatbots, developers mostly use temporary solutions. Microsoft, for example, never revealed how it fixed the Copilot vulnerability and refused to answer questions about it. Although Rehberger’s specific attack no longer worked, indirect prompt injection remained a problem.

Another tactic of chatbot developers is to restrict broad instructions that can be triggered by untrusted data. In Google’s case, this seems to include restrictions on invoking apps or data within its Workspace suite.

Delayed tool invocation

However, that limitation proved easy to get around with a clever trick known as delayed tool invocation. Instead of providing an immediate instruction, the untrusted content linked the instruction to a user action.

Rehberger’s demonstration focused on Gemini. His proof-of-concept exploit managed to bypass security and trigger the Workspace extension to locate sensitive data in the user’s account and bring it into the conversation.

Instead of an untrusted email directly injecting an instruction, the instruction was made contingent on an action the user was likely to perform anyway.

Exfiltration of data in this exploit could be done by pasting the sensitive information into an image markdown link that referred to an attacker’s Web site. The data was then stored in that site’s event log.

Google eventually limited these attacks by restricting Gemini’s ability to render markdown links. But because there was no longer a known way to exfiltrate the data, Google took no further steps to solve the underlying problem of indirect prompt injection and delayed tool invocation.

The danger of long-term reminders

Gemini has taken similar action against automatic changes to long-term memory. This memory is designed to keep users from entering the same basic information each time, such as their workplace or age. Instead, these details are stored and automatically remembered in future sessions.

Google and other chatbot developers, meanwhile, imposed restrictions on long-term reminders after Rehberger demonstrated a hack in September in which he made ChatGPT believe the user was 102 years old, lived in the Matrix and thought the earth was flat. ChatGPT kept these false details permanently and took them into account in all future sessions.

Even more impressive, he planted false memories in which ChatGPT for macOS was instructed to send every user input and output to an external Web site using the same image markdown technique. OpenAI’s solution was simply to add a function that blocked the exfiltration channel. However, this did not address the fundamental problem of deception.

Rehberger’s hack

The hack that Rehberger presented combines several of these techniques to allow false memories to nest in Gemini Advanced. This is the premium version of Google’s chatbot available through a paid subscription. The attack goes like this:

1. A user uploads a document and asks Gemini to create a summary.
2. The document contains hidden instructions that manipulate the summarization process.
3. The generated summary contains a hidden request to save specific data if the user uses certain trigger words.
4. When the user answers one of the trigger words, Gemini is tricked and stores the information chosen by the attacker in long-term memory.