How to guard against misbehaving AI LLMs

The new breed of generative AI of course makes extensive use of Large Language Models (LLMs) to draw its power and ‘understanding’ of the world from. But if we are to build safe and functional AI, we need to understand more about the infrastructural form of LLMs. Beneath their ingenious capacity lies a potentially perilous secret: the data that flows into LLMs touches countless systems and this interconnectedness poses a growing data security threat to organisations. So how do we put guardrails in place to take control of naughty, misbehaving and hallucinating AI?

The problem in many deployments is that the LLMs themselves are not always completely understood. This is the opinion of Chase Lee in his role as VP of product at Vanta, a company known for its trust management platform that delivers compliance automation services for enterprises. Lee says that depending on the model, the inner workings of an LLM may be something of a black box, even to its creators.

Where these AI black box scenarios exist (perhaps due to the use of closed source LLMs, perhaps due to the LLM being delivered as-a-Service in a commoditised and packaged way, perhaps due to unknown reasons), the IT team can’t fully understand what will happen to the data is put in, or indeed how or where it may come out.

Shadowy risks of AI

To prevent what he calls the ‘shadowy risks’ posed by LLMs that have the potential to ‘nullify their allure’ in modern AI applications, Lee insists that organisations will need to build infrastructure and processes that perform rigorous data sanitisation of both inputs and outputs. These processes need to be able to monitor (the term being used here is to ‘canvas’ i.e. as in question & assess) every LLM on an ongoing basis. This is true even when using hosted foundational models as their own alignment efforts may not apply to more narrow use cases.

Faced with these tasks, the Vanta team calls for the use of a model inventory i.e. a tool that is capable of knowing every instance of every AI model an organisation is running, both in live production as well as those that might be in more embryonic stages of development.

“Maintaining a comprehensive model inventory is critical for any organisation utilising machine learning models in both production and development environments. Having a record of every instance and iteration of models ensures transparency, accountability and efficient management,” proposed Lee, in no uncertain terms. “In the realm of live production AI models being deployed inside working enterprise applications, it is imperative to track each model to monitor its performance, troubleshoot issues and to implement necessary updates or enhancements.”

Where’s the LLM fire extinguisher?

While we might think that the prototyping and development phase associated with AI model production might be a calmer office/cubicle environment, this is far from the case. If we’re playing with fire (in a good and positive way) here, we need to make sure that we know where the asbestos lining (and possibly the fire extinguisher) is located. This means keeping a thorough inventory to enable AI-focused software application development teams to keep track on different versions, experiments and improvements, facilitating the decision-making process on which models to promote to production.

Next we need to look at data mapping, the tasks undertaken to understand every piece of data flowing into our AI models.

“Data mapping is a critical component of responsible data management in the context of machine learning models,” said Vanta’s Lee. “It involves a meticulous process of comprehending the origin, nature and volume of data that feeds into these models. It’s imperative to know where the data originates, whether it contains sensitive Personally Identifiable Information (PII) or Protected Health Information (PHI) and to be able to assess the sheer quantity of data being processed.”

Do you know your data flow?

From its own work with customer deployments, the Vanta team say they have recognised the need to insist upon data mapping so that organisations can understand the precise data flow that they are overseeing. They say that this level of insight not only enhances data governance and compliance but also aids in risk mitigation and the preservation of data privacy. Above all (and perhaps most importantly) it ensures that machine learning operations remain transparent, accountable and aligned with ethical standards while optimising the utilization of data resources for meaningful insights and model performance improvements.

“Data mapping bears a striking resemblance to the compliance efforts often undertaken for regulations like GDPR (General Data Protection Regulation),” clarified Lee. “Just as GDPR mandates a thorough understanding of data flows, the types of data being processed and their purpose, a robust data mapping exercise extends these principles to the realm of machine learning.

By applying similar practices to both regulatory compliance and model data management, the suggestion being made here is that organisations can ensure that their data practices adhere to the highest standards of transparency, privacy and accountability across all facets of their operations, whether it’s meeting legal obligations or optimising the performance of AI models.

Adversarial testing on models

“One method rising in popularity is adversarial testing on models,” explained Lee. “Just as selecting clean and purposeful data is vital for model training, assessing the model’s performance and robustness is equally crucial in the development and deployment stages. These evaluations help detect potential biases, vulnerabilities, or unintended consequences that may arise from the model’s predictions.”

He notes that there’s already a growing market of startups specializing in providing services for precisely this purpose. These companies offer expertise and tools to rigorously test and challenge models, ensuring they meet ethical, regulatory and performance standards.

“Data sanitation isn’t limited to just the inputs in the context of LLMs; it extends to what’s generated as well. Given the inherently unpredictable nature of LLMs, the output data requires careful scrutiny to establish effective guardrails. The outputs should not only be relevant but also coherent and sensible within the context of their intended use. Failing to ensure this coherence can swiftly erode trust in the system, as nonsensical or inappropriate responses can have detrimental consequences,” added Lee.

Home truths to takeaway

There are a lot of home truths being tabled here. We all know the ‘garbage in = garbage out’ tagline, but we should perhaps (Ed: spoiler alert, it’s not perhaps, we need to do this) we should be thinking about that adage and maxim in real terms when it comes to how we use LLMs in the AI universe.

The fact is some data is just too risky to input into an AI model, often because it carries significant risks, such as privacy violations or biases, which is why Vanta takes the stance that it has clarified here in relation to the way it delivers compliance automation services. We can make AI safe, but we need some sharp tools if we’re going to stop it being naughty and misbehaving.

How to guard against misbehaving AI LLMs

Insight: Generative AI

Shadowy risks of AI

Where’s the LLM fire extinguisher?

Do you know your data flow?

Adversarial testing on models

Home truths to takeaway

Stay tuned, subscribe!

HPE can finally take over Juniper after settling with the US government

Tech sector calls on EU to pause AI Act

HPE’s strategy: AI, smart switches, GreenLake and beyond

AI without ethics will never truly serve humanity

How do you innovate for a future you can’t entirely predict?

Snowflake makes AI Data Cloud the brain of any business

How do you roll out GenAI in enterprise environments?

The AI reality tour

GITEX DIGI_HEALTH 5.0 - Thailand

IT Arena

Innovation Week 2025

Luxembourg Venture Days

Appdevcon

Experience Synology’s latest enterprise backup solution

How to choose the right Enterprise Linux platform?

Enhance your data protection strategy for 2025

Strengthen your cybersecurity with DNS best practices