AI systems are increasingly making decisions that impact people, processes, and businesses. But what if the models they’re based on are no longer reliable? AI integrity is about protecting the core of artificial intelligence: the data, algorithms, and interactions that determine how a model thinks and acts. And it is precisely this integrity that is under threat from a new wave of cyberattacks. So-called integrity attacks are a growing threat that undermines the reliability of AI models, with potentially far-reaching consequences for businesses and society.
What do we mean by AI integrity?
AI integrity revolves around the reliability of the AI model and its underlying algorithms. This integrity can be compromised in various ways. This can be caused by developers making mistakes when creating the language model, but also by criminals using attacks to influence the model’s outcomes. Think of prompt injections, model poisoning, or labeling attacks.
In a prompt injection attack, the attacker manipulates an AI model by adding hidden instructions to seemingly innocuous text, such as a calendar invitation or document. Once the AI gains access to that source, it can unintentionally leak confidential information. Such vulnerabilities demonstrate that even seemingly secure integrations between AI and office applications can pose risks.
Even more troubling is model poisoning: manipulating a model through contaminated training data. Just 0.001% of corrupted data can be enough to influence the outcomes of an AI system. In a world where AI agents increasingly communicate with each other, one infected agent can influence others. This is comparable to how disinformation spreads between people.
A more subtle but equally dangerous form is the labeling attack. In this attack, training data is deliberately mislabeled, for example, by assigning incorrect classifications to images or text. As a result, an AI learns incorrect associations, such as that a traffic sign doesn’t correspond to “stop” but to “go.” This type of attack is difficult to detect because the data appears legitimate at first glance, but fundamentally changes the AI’s behavior.
The challenge of AI governance
Classic forms of governance are inadequate for AI. While traditional IT governance assumes stable, deterministic systems, AI operates non-deterministically: it doesn’t always respond the same way to the same input. This makes risk assessment more complex.
AI governance must therefore also consider transparency, explainability, and trust. We are only at the beginning of developing methods to measure these properties. Yet, one thing is clear: organizations cannot afford to wait. It is essential to invest in secure AI development and coding now.
Shadow AI
Another growing risk is shadow AI. This is the use of AI tools outside the control of IT or security departments. Consider employees who use ChatGPT or other generative AI in their daily work, without oversight of the data they enter.
Shadow AI cannot be prevented, but it must be managed. This can be achieved by mapping where and how AI is used, setting boundaries for what is and isn’t acceptable, and making employees aware of the risks. Think of data breaches, manipulation of AI results, and the loss of control over sensitive data. User training is crucial to prevent data loss and unintentional exposure of company information.
The way forward: embedding integrity in culture
Protecting AI integrity requires more than just technology. It requires a holistic approach, in which security by design, AI governance, and compliance (such as ISO/IEC 42001) go hand in hand.
Equally important is achieving a cultural shift. AI security must be supported by management and understood by every user. Only when organizations place the value of integrity at the heart of their processes, policies, and mindset can they leverage AI’s full potential without jeopardizing their security. We must not only protect people from AI, but also protect AI from people, and from itself.
This article was submitted by KnowBe4.