Patrick Walsh

Securing AI: The Stack You Need to Protect Your GenAI Systems

Enterprise-grade GenAI systems need these 7 things

Introduction

GenAI and particularly Large Language Models (LLMs) have shot from research to adoption faster than any technology in history. When ChatGPT was first released, it hit 100 million users in two months. Two months! To put this in perspective, here’s how long it took other tech to get to 100 million:

Tech Time to 100m users Visualized
Smart Phones 16 years ❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙
Internet 7 years ❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙
Instagram 2.5 years ❙❙❙❙❙❙❙❙❙❙❙❙❙❙❙
TikTok 9 months ❙❙❙❙❙
ChatGPT 2 months

And now, 18 months later, over a billion people are using LLMs. 93% of companies are now using AI. The adoption has been incredible.

GenAI attacks

But there’s a dark side to these GenAI systems. They are susceptible to whole new classes of attacks. Many of these attacks are brand new, and more are developing and getting refined every day. These new systems, while incredible, are also stunningly vulnerable in ways we haven’t seen before. And we have little pre-existing protection.

AI has been around for decades. To paraphrase a line from Whit Diffie, it’s like the Porsche 911; the model name is the same, but the car keeps on changing. So it is with AI. The first implementation of a neural network was made in the 1960s. You’d think that our understanding of how to attack this AI would have evolved apace gradually over decades as neural networks and other machine learning has. But this is not what’s happened.

We went from few AI-specific attacks to tens of thousands of research papers mostly kicking off around 2020. One of the most talked about attacks is called prompt injection, where a user tricks the LLM in various ways via otherwise normal interactions. The first deep academic paper I could find on this topic was published in 2022. Another attack is called a model inversion attack where the attacker extracts original training data back out of a model, usually by just sending it specific prompts. The earliest academic paper I can find for this sort of attack is from 2015. And in the related embedding inversion attack, a text embedding can be transformed back to its original source text. The first paper I can find on text embedding inversions is from 2020.

The pace of new research in this area is staggering with new academic papers weekly with attack enhancements, new combinations of different novel attacks, and entirely new attack approaches. Modern GenAI systems have a target on their backs and researchers are getting creative in targeting them.

This is a case where we have a new (well, old, but newly turbocharged) technology that has bypassed the usual gatekeepers and adoption curves and gone straight into the hands of the masses. And the security of these systems and the adoption of security for these systems has not kept pace.

Defending the new battlefield

Security organizations have been defending a pretty well understood surface area for years. The biggest sea-change in recent history, excluding GenAI, has been the shift to the cloud. But that’s really a shift to other people’s servers and to a more distributed micro-services approach to software systems. We’ve improved tactically in these environments, but the defenses are all conceptually similar to what we had before. Not a big change at all.

With GenAI, we’re suddenly faced with an entirely new environment with new things to defend and new weapons being used. Security professionals now have to worry about:

  1. Their existing, non-AI systems
  2. Their new GenAI systems, which are integrating with their previously secure classic systems
  3. How GenAI can be used to attack all of the above

And to make matters worse, the level of skill required for many of these attacks is low. It’s like some dystopian sci-fi future where absolutely anyone is a hacker if they can convince a machine to do the wrong thing simply by talking to it and leading it into an unexpected or undesired state.

Solutions for GenAI security

For security and privacy professionals, all is not lost. In fact, the thriving innovation in the startup world is in full force right now, with different companies diving in to solve different aspects of these new problems. Unfortunately for large companies that have security vendor fatigue, there’s no King of the Hill with a comprehensive approach yet. The big security vendors like SentinelOne and Cisco and Palo Alto are lagging in bringing products to market that can meaningfully help. Consolidation and roll-ups will come, but companies can’t afford to wait for the security to shake out because the adoption of these inherently vulnerable technologies is already underway.

We’re going to ignore how AI is being used to make classic security tools better and where AI is being used to detect AI-generated attacks. That’s interesting, but we’re focused here on securing the AI systems themselves. And if you’re a security professional adopting new AI-powered security solutions, you should ask yourself (and your vendor): how are those systems being protected? What data do they see and how could that cause the data to leak? And for security systems that themselves leverage AI: does the benefit of the AI outweigh the new vulnerabilities being introduced in the security systems themselves? What happens if an attacker actually targets the LLMs used by the security systems to analyze the attacker’s activities, for example?

When it comes to ways to think about securing these systems, we have a number of excellent competing frameworks including ones from MITRE, NIST, Google, Microsoft, and OWASP. These are important starting points that use familiar approaches to security with processes and checklists. But for our purposes here, we’re going to ignore these. Why? Because we’re surveying what GenAI protection solutions are on the market and using that to inform what we can actually protect against today.

We’ve identified seven classes of solutions to GenAI system security problems:

  • Confidential Computing – systems that run models or perform other tasks in encrypted computing environments
  • Encryption and Data Protection – approaches that encrypt GenAI data in various parts of a GenAI workflow
  • Governance – solutions that focus on applying frameworks, discovering and tracking projects, creating reports for regulatory purposes, etc.
  • Model Testing – solutions that test models for vulnerabilities, backdoors, supply chain issues, and other weaknesses
  • Prompt Firewall and Redaction – solutions that intercept prompts and responses and watch for attacks or data leaks, and which can redact information
  • QA – solutions that help to more formally monitor and test GenAI systems
  • Training Data Protection – solutions focused on fixing privacy issues in training data (for those building their own models)

We’re compiling solutions that fall into these categories in an open source Awesome Awesome Security Solutions for AI Systems list. Additions, suggestions, and corrections are welcome. We’ve also added an infographic to the awesome list to make it more digestible:

Infographic showing vendors by category

Some vendors have products in multiple categories. Within the categories, solutions can vary a great deal. For example, some prompt firewalls focus more on detecting and blocking or else redacting PII and that’s it. Others focus more on protection from jailbreak and prompt injection attacks. So while all of these call themselves a prompt firewall, you’ll need to look at each one to understand what they do or don’t do.

Everyone using shared models should be looking to AT THE VERY LEAST utilize encryption to protect private data, track AI threats with a governance solution, and monitor interactions using a prompt firewall. That means employing solutions from at least those three categories. And those who are running their own models will likely want to add solutions from the model testing and QA categories. For those creating models in-house, solutions from the training data protection category should be utilized as well.

That’s a lot of solutions to pull together, but not doing so risks leaking of private data, intellectual property theft, and worse mischief. The hacks are getting more clever like the recent GenAI worm that exploited “smart” email features.

Example AI security stack

If I were building an AI feature today, either a new app or adding LLM capabilities to an existing one, but not otherwise creating or running my own models, then here’s what I’d choose to pull together:

  1. IronCore’s Cloaked AI to encrypt and protect embeddings going into vector databases (or legacy databases with new vector capabilities). Full disclosure: I’m the CEO of IronCore Labs. But this is the only solution on the market to protect the data, and there’s no downside trade-off to it. You can still search over the encrypted data and results remain strong, while the ability to invert the embeddings is eliminated. It’s open source and inexpensive for commercial apps, and it can be used without any sort of cloud service.
  2. OneTrust’s AI Governance to track projects, understand risks, and prepare to meet various compliance needs in the EU and elsewhere.
  3. HiddenLayer’s AI Detection and Response to monitor for threats flowing through the LLMs I’d use (but presumably not run myself).
  4. Freeplay to track and manage prompts and understand as well as I could how well my product was performing and if tweaks and changes improve or screw up that performance.

Depending on the level of sensitivity of the data, I might also feel it necessary to host my own models, probably using open source ones. I’d love to be able to run those models in a confidential compute environment or have someone else do that for me, but for the moment, it’s more cost effective and possibly “good enough” to just run them myself. I’d probably get a model from HuggingFace – maybe something like Llama3, though it would depend on the specific use case. I wouldn’t particularly care if someone stole the weights since they aren’t my intellectual property, but I’d want to make sure the model was safe for me to run. To guard against threats when running open source models, I’d add these solutions:

  1. HiddenLayer Model Scanner to watch for supply chain issues in the model(s) I’m fetching and because it’s nice to reduce the number of vendors when feasible.
  2. Lakera Red because a model tester is critical and the Lakera team has a deep body of examples of attacks on LLM systems to work with and throw at the system I’d be creating – ideally uncovering any issues before it goes live.

Final thoughts

The above envisions either four or six new pieces of software to buy, test, incorporate, and run. That doesn’t even include whatever I need for the GenAI side of things from vendors like Qdrant and Cohere and maybe from my cloud infrastructure provider, too. It’s a little bit daunting, frankly. And the GenAI landscape – the attack landscape included – is moving very quickly, which means I probably can’t just pick my solutions, build my software, and then assume those decisions will hold for five years. New GenAI models and tools will emerge and new attacks will continue to develop, which all means I’ll likely need to review my stack and my choices at least annually. Sooner if I hear of important new developments.

But sitting on the sidelines and waiting for things to settle, consolidate, and mature isn’t a good option here. The potential value of this “new” AI is too high. The competitive disadvantages if competitors use the tech is just too strong. If you’re a buyer of cloud services or software and your vendor is adding AI, you need to push for formal stacks, and ensure quality and security are being tackled by asking pointed questions about what they’re doing in each of these categories. And if you’re a tech company building software with GenAI over private data, you can’t responsibly deploy without first addressing the relevant categories noted above.

Many companies are starting with toy projects that add chat to their documentation or operate on other public data. The risks involved in any abuse there are low because all the data is already public. But the real value of LLMs and vision models and generative AI comes when it’s applied to real data across domains from HR to Finance to Health to Sales to Customer Support and on and on. Mining data, summarizing it, interrogating it, surfacing what’s important – this is potent stuff. But as soon as these AI systems intersect with sensitive information, the danger goes deep into the red. The needle on your security alert gauge should be pinned at “Extreme”, and the red lights should be flashing.

Today’s professional AI stack means grabbing solutions from several categories and probably several vendors and then living with your choice for some unknown duration, until a point in the future when stability and maturity arrive.