AI Shadow Data White Paper Download
There are three major areas of untracked and unprotected shadow data in AI systems where copies of sensitive data accrue. Learn about the areas of AI shadow data and how to manage them.
White Paper Sample
Get the white paper now, for free
Data
Understand the risks for data in AI systems
There are three major pipelines for shadow data where AI systems clone sensitive data as a byproduct of using LLMs. Any system using modern generative-AI models, even if the models are built or hosted in-house, is likely to suffer from these problems.
Using these models on public data, like public-facing documentation on the website, has low risks since the theft of the underlying data is not a concern. But using these models on private data that would be problematic if leaked or stolen means the generation of shadow data that security teams should be tracking. The risks around this data can be mitigated, but they first must be understood.

Attraction
AI is a magnet for data from across the organization
Whether it's intended or not, most of an organization's data (and data of individuals, too, for that matter) will end up flowing through AI systems either because a software vendor providing some functionality decides to add new AI features or because various initiatives in an organization aimed at efficiency work best with more data.

Security
Understand how to minimize and protect the shadow data
This white paper also prescribes approaches for dealing with the shadow data. Step one is to identify where it is and what's contained inside. Step two is to minimize it, if your use case allows. And step three is to protect the shadow data so it isn't a giant treasure chest containing all the sensitive data of your organization. For those not building their own AI features, we recommend you start with these questions to ask your software vendor to see if they're aware they're creating shadow copies of your data and to see if they're taking measures to protect it.
Download the document for a more in-depth description of the problems and solutions. The back of the white paper contains background information for those who need to get up-to-speed on concepts like RAG.