AI Shadow Data White Paper Download

There are three major areas of untracked and unprotected shadow data in AI systems where copies of sensitive data accrue. Learn about the areas of AI shadow data and how to manage them.

White Paper Sample

Get the white paper now, for free

(Scroll sideways over the image above to see the first few pages.)

Data

Understand the risks for data in AI systems

There are three major pipelines for shadow data where AI systems clone sensitive data as a byproduct of using LLMs. Any system using modern generative-AI models, even if the models are built or hosted in-house, is likely to suffer from these problems.

Using these models on public data, like public-facing documentation on the website, has low risks since the theft of the underlying data is not a concern. But using these models on private data that would be problematic if leaked or stolen means the generation of shadow data that security teams should be tracking. The risks around this data can be mitigated, but they first must be understood.

Attraction

AI is a magnet for data from across the organization

Whether it's intended or not, most of an organization's data (and data of individuals, too, for that matter) will end up flowing through AI systems either because a software vendor providing some functionality decides to add new AI features or because various initiatives in an organization aimed at efficiency work best with more data.

Security

Understand how to minimize and protect the shadow data

This white paper also prescribes approaches for dealing with the shadow data. Step one is to identify where it is and what's contained inside. Step two is to minimize it, if your use case allows. And step three is to protect the shadow data so it isn't a giant treasure chest containing all the sensitive data of your organization. For those not building their own AI features, we recommend you start with these questions to ask your software vendor to see if they're aware they're creating shadow copies of your data and to see if they're taking measures to protect it.

Download the document for a more in-depth description of the problems and solutions. The back of the white paper contains background information for those who need to get up-to-speed on concepts like RAG.

Other Resources