8 min Security

10 ways to attack an LLM

10 ways to attack an LLM

The popularity of generative AI is only growing. Nvidia is struggling to deliver the huge demand for AI-capable hardware as organizations look to capitalize on the power of LLMs (large language models). OWASP (Open Worldwide Application Security Project) sees many ways things can go wrong with this AI application security, however.

OWASP states that there has been no central source to see LLM vulnerabilities until now. Organizations are adopting LLMs en masse, but do not seem to follow a clear security protocol in the process. However, even if best practices exist, they are not a direct guarantee of compliance. OWASP observed as much with its earlier description of Kubernetes vulnerabilities.

Tip: These are the 10 most dangerous Kubernetes vulnerabilities

To create this top 10, OWASP consulted nearly 500 experts and could count on more than 125 contributors. The longlist included 43 vulnerabilities, but after several rounds of voting, they reduced this number to 10. Subsequently, these were checked by other sub-teams and the community. The research team included employees from Google, Microsoft, AWS, Cisco, Red Hat and Salesforce, among others.

LLM01: prompt injection

The input ultimately determines what an LLM produces. Often these results are difficult to predict in advance. Still, developers can refine the results with better training and improving the parameters. Public chatbots such as ChatGPT and Google Bard are known to become cautious fairly quickly, if you ask them bold questions that could get them in hot water. In a professional context the stakes are higher, and an internal LLM may be privy to trade secrets.

OWASP talks about direct and indirect prompt injections. In the first case, the team characterizes this method of attack as a jailbreak for an LLM: in that case, a threat actor has discovered or rewritten the underlying system prompt. As a result, an attack might allow the criminal to access sensitive data stores on which the LLM rests. The second scenario occurs when an LLM is addressable through an external source and the attacker performs a prompt injection to hijack the conversational context. Then the LLM operates as a “confused deputy,” in the words of OWASP. This means that it can perform tasks that it normally should not do for a user.

The importance of a “human in the loop” cannot be underestimated, such as by permitting certain actions contingent on human approval. Adherence to zero-trust principles and least-privilege access is also much needed. An LLM must be treated as an untrusted user to minimize the influence of an attacker.

LLM02: insecure handling of output.

Anyone deploying an LLM must also monitor which components it can communicate with. For example, an output from it can directly affect systems in the backend or invoke privileged and client-side functions. Since inputs can be very influential to an LLM, it is difficult to contain this danger. The consequences can be pretty serious, such as privilege escalations and remote code execution.

OWASP builds on the measures in the first vulnerability: treating an LLM as an average user is desirable. In doing so, the organization has an ASVS (Application Security Verification Standard) that can be accessed on GitHub and elsewhere. This standard also describes how to encode output.

LLM03: data poisoning

Training data, together with the parameters of an LLM, forms the basis for the functionality of an AI model. This information can be collected within an organization to support customer service or analyze banking data, for example. However, many parties also use external data. However, a malicious party can place data that can create vulnerabilities or backdoors if the selection process does not pick up on it. OWASP states that manipulating training data can limit the effectiveness of an LLM. They also state that large external language models can be hazardous: there is no guarantee that no data poisoning has occurred.

To overcome this problem, organizations must be careful in verifying data sources. In addition, it is important to think carefully about the use case of the LLM in question: what data is actually needed and should it be publicly available? Data poisoning can still be detected in the testing phase even after a model has been trained.

LLM04: Model Denial-of-Service

A DDoS attack is one of the simplest ways to wreak havoc on a website. Similarly, an attacker can bombard an LLM with inputs to pressure its hardware resources. Those who do not set strict limits on a chatbot’s context window can create exponential system demands. A public chatbot like Bing Chat, for example, has a limit of 30 interactions within the context window for this very reason. Smaller parties should set the same requirement, especially if available hardware is limited locally or cloud costs are to be kept in check. Setting API rate limits and making the hardware resources used transparent prevents problems.

LLM05: supply chain vulnerabilities

Vulnerabilities in the supply chain can have all sorts of effects on LLMs, from security leaks to biased model outputs. Typically, securing the supply chain involves controlling software components, but with LLMs this can be more complex. For example, pre-trained models or third-party training data can be vulnerable to data poisoning. The problem can also go the other way around: those who tap into an external API for an AI model may not have read in the terms and conditions that their own data can be used for further training.

LLM06: releasing sensitive data

No matter how well-trained and tested an LLM is, it may be the case that a generative AI model can pass on information in an unexpected way. An accidental prompt can suddenly reveal sensitive data or algorithms, even if the end-user has no malicious thoughts in mind. OWASP recommends making consumers of LLM applications aware of the safe handling of AI models.

Tip: Samsung re-bans use of ChatGPT

The solution, according to OWASP, lies with the training data. That should not contain user data or unwanted sensitive data. Users should be able to indicate that their data may not be used for further training. This creates the need for what OWASP describes as a “two-way trust boundary,” where both client and LLM have an inherent trust in the other. Still, an error is not always avoidable with this technology, as OWASP lets us know.

LLM07: insecure plugin design

Attackers can perform unwanted actions through an LLM plugin. This is possible because an LLM determines when it consults a plugin, making it essential that these plugins have limited access. Developers should therefore restrict plugins as much as possible regarding inputs. They should also ensure that sufficient authentication methods are applied.

LLM08: too much agency

An LLM system can be very effective if it has access to another system. For example, an AI assistant can dive into another application at the user’s behest to figure something out and divulge that information in a natural way. Anyone who is inattentive when setting up this functionality can however allow the LLM to perform unwanted actions. This form of agency or autonomy can allow, for example, room to execute shell commands or update a database as a regular user of the other app could. With too many permissions, this can have major consequences. Hence, OWASP argues that an LLM should be restricted as much as possible. Again, it is useful to have someone monitor what an AI model is doing in real-time, just as ordinary users are monitored in an IT environment.

In addition, an LLM should not simply be able to decide for itself what it can access, but must get past existing verification steps to use an application. An example from OWASP is that a malicious individual could send spam through another user’s e-mail systems because it had access without further authentication and had no restrictions on functionality.

LLM09: too much dependency

LLMs can do a lot, but as mentioned above, they can make mistakes. They do not “think” and are ultimately a very complex prediction machine. This means that they must be handled very carefully in a professional context. LLMs can produce programming code, but even there they can contain incorrect or dangerous content. It is a known problem that these AI models can go astray, making it important for organizations to check that the output is as desired – on a regular basis. The unique aspect of this vulnerability is that the attack is not malicious: an unsuspecting developer can deploy programming code that one had not looked at enough to pick up the bugs.

LLM10: model theft

More and more companies are deploying LLMs that, in many cases, they have (further) developed themselves. The training data and the model itself can be enormously valuable. Like other digitally available IP, it is critical to protect this information. As with other data, internal users should have as little access as possible to an LLM. Within a zero-trust environment, an AI model can be shielded as much as possible from a disgruntled employee or a malicious intruder.

However, revealing part of an AI model via outputs is also possible. OWASP describes an example where an attacker provides many targeted prompts, making the inner workings of an LLM somewhat subject to reverse engineering. Still, the researchers argue that an entire model cannot simply be unraveled in this way.

Also read: Liquid neural networks: how a worm solves AI’s problems