Training AI Without Leaking Data
How Encrypted Embeddings Protect Privacy
Machine learning techniques have long been super cool for making predictions, but in recent years, their ability to generate text and images has been mind blowing. At the same time, the ability to learn how to classify things without having to hand-tune training sets has been transformative for those who are using machine learning to make better software and systems.
The Risks of Private Data in AI
But there’s a fly in the ointment. We want our AI systems to be good, but we don’t want our private data leaking out into the world. It’s really no one else’s business what I wrote in my diary or who I met with last week or what my doctor told me my prognosis is. If we want to take more of a business view of AI, then companies sure don’t want internal discussions of potential liability or predicted future business results leaking outside of a tightly controlled group of people.
This creates a bit of a tension: what if I want an AI model to raise red flags when we are opening ourselves up to liability? Or what if I want a model to summarize my previous week or quarter or to suggest when I should meet with someone again based on prior patterns or meeting transcripts?
Undoubtedly, AI could be more useful to me and to my business if it has access to more data. And if I’m a software provider, then I can provide my customers with more value if I can leverage AI on top of their data either to make per-customer models or to aggregate data for more broadly useful ones.
For neural networks, in particular, more training data can lead to better results and more powerful AI systems.
At the same time, anything that an AI is trained on has a risk of showing up in outputs. Even though all the training data is mixed together inside the neural network to produce outputs that generalize from the inputs, specific training inputs are often produced back verbatim.
If we let OpenAI train on our private data, there’s a high risk that others using the model will see elements of our private data emerge in outputs from the OpenAI models.
While that chance should go down as more inputs are used, even with billions and billions of inputs, large models still produce copies of training data. We’ll soon be giving conference talks (first up at RMISC 2025) that demonstrate these attacks and go in depth on the risks of private data in AI systems. Stay tuned for more of that info from us later, but for now, take my word for it: if your data is used to train a generative model, then your data is at risk of extraction.
Note: the risk is lessened when the model is not generative. By “generative”, we mean creating free-form text or images or whatever. For example, a model that is trained on detecting tone or sentiment in text or that has been trained to pick out high-level subject matter (e.g., topics like finance or health, etc.) is probably not going to produce an output that is problematic. In that case, the data risk is more around the storage and handling of the training data itself and who can access that data.
What Data is Innocuous?
Suppose you want to build your own AI model and train it to predict what’s next in a sequence such as what video a person might watch next given their video viewing history. Before you can build the model, you need training data. In this scenario, that means you need access to the viewing history of a large number of people, which means using information that some people might feel hesitant to share.
Note: I’ve intentionally chosen a fairly innocuous seeming scenario here. This data isn’t personally identifiable information or bank account info or medical records. But at the same time, the information that Netflix and YouTube gather on people’s viewing histories is likely to tell them a lot about political leanings, sexual orientation, medical diagnoses (people searching on stage 4 melanoma are probably diagnosed with it themselves or have a close relative or friend who is, for example), and so on. Personally, I disabled video history on YouTube after I read the book, “The Chaos Machine,” and learned how incredibly damaging Google’s recommendation engine can be.
So is that information a privacy problem? Even if your exact watch history gets recommended to someone else, it isn’t saying, “Patrick liked similar things to you, so try this.” It’s aggregated and stripped down and anonymized. In this situation, is the data a problem? Probably not… unless more data is mixed in.
What if the model builders start to take into account things like location and gender and age? After all, more data produces better results in AI, and it stands to reason that there are regional and generational preferences at play here. Yet adding more information makes individuals more identifiable. For example, I’m interested in lock picking. I suspect there aren’t many others in my neighborhood who share that interest, and I bet someone who knows me and has access to video histories (filtered by region and gender and age bracket) could pick out my history with a high degree of success.
Which leads me to the question of who is building the models? What training data can they see, and how much of an issue is it that they can comb through those data points? In my view, it’s always a slippery slope, and private data should be kept private to the individual who produced it as a default stance, always. Exposure of even innocuous-seeming data can be very invasive.
A More Clearly Problematic Example
Let’s switch to an example that carries more obvious risk: suppose we’re building a model based on people’s interactions with customer support. We take the thread of interactions between a customer and a support agent and associate it to the rating that the customer later gives as to their experience with the support process or the agent. The idea is to be able to look at future interactions and predict the final ratings so that supervisors can be alerted of interactions that are problematic as early as possible. For now we’ll assume that the model being built is specific to a particular company, though a customer support SaaS company could easily choose to train across all customers and offer a new “red flag” feature to everyone.
The risk with this data is that customers may be sharing lots of personal information inside their messages. They can send passwords, information about their private lives, credit card numbers, they might complain that they’re in the middle of chemotherapy and hassles with customer support are the last thing they need – really, just about anything could show up in these conversations.
We don’t want data scientists browsing through these conversations. And we don’t want to copy potentially sensitive data into more locations. As an added risk, this training data could end up being used for generative models that recommend responses to a customer, which could then output private training data.
Let’s head all of that off at the pass by locking up that data, but still allowing it to be used to build certain classes of models.
Vector Embeddings as Training Data
Any input in AI has to first be reduced into a mathematical representation. The input is often a fairly direct representation. For example, an image might be reduced to a grid (matrix) of pixel values. A sentence may be reduced to a list of numbers where each number represents an index to a dictionary entry. But there’s another option for representing an input as a list of numbers, and that’s a vector embedding.
Vector embeddings are produced by embedding models, which are trained to extract the concepts, meaning, or intent of inputs and encode that information into a vector (a list of numbers). We can measure the similarity between two inputs by finding the distance between the two vectors. If we generate embeddings from images, two that are similar might be two photos of an empty beach on a sunny day. A very different image might be a portrait of a person with a birthday cake at a restaurant. Vectors also capture intent and meaning in sentences. The sentences “I’m happy today” and “this morning I feel great” are similar despite using different words and sentence structures. The sentence “I need to fix the sprinklers” would be mathematically distant from the other two.
These are then used for things like search (finding relevant data based on an input image or a sentence which may use different words), and for recommendation engines (find something similar based on various features or history).
You can build an embedding model for just about anything, but there are a lot of pre-trained models for handling text, image, video, audio, and other inputs. And you can use the vectors these models produce as training inputs for making new models, which is a technique that can lead to faster inferences, smaller more efficient models, and shorter training times.
Security of Embeddings
Although many people think converting inputs into vector embeddings is enough to obscure the data, that’s not the case. An entire field of study and demonstrated attack vector (no pun intended) is to “invert” embeddings back to near approximations of their original forms – images, faces, text, etc. While a human looking at a series of numbers can’t directly make much sense out of it, there are techniques to make those numbers very meaningful. We’ve covered some of these attacks in previous blogs, such as this one on recreating faces from facial recognition embeddings and this one on recreating text from sentence embeddings.
Training on Encrypted Vector Embeddings
What we need is a way to render the training data safe from a human who has access. The human should not be able to understand the meaning of any piece of training data, but should still be able to create useful models from the data.
It turns out that encrypting vector embeddings using approximate-distance-comparison-preserving encryption is a way to do just that: protect the training data while allowing data scientists to create powerful models.
Approximate-distance-comparison-preserving encryption is a way to take a vector embedding and protect the underlying meaning of the vector by preventing an inversion. It still allows for comparison of distances between vectors so questions like whether two faces are the same person can still be answered. It’s a powerful tool that’s most often used in vector search use cases, but can be applied also to training data.
It works like this: the production data being used for training first gets turned into a vector embedding using an embedding model. This happens in the production environment that has access to the source data. Then in that same environment, a single encryption key is used to encrypt all of those embeddings. If you were building different models on different segments of data, you’d use different keys for each model that was to be created.
Once you have the encrypted vector embeddings, they can be exported into a less secure environment where data scientists can “see” and use the data.
Note that in this case, we don’t ever want to be able to decrypt the vectors. To make this a one-way operation, when encrypting the vectors, the per-vector random number that is created (which we refer to as an IV), should be discarded.
From here, a model can be trained as usual as if nothing special was done to the inputs.
Note: you can’t use encrypted vectors to fine-tune an LLM that was previously trained on unencrypted data. Encrypted vectors are best used to train models that are used for classification or prediction rather than text or image generation.
Using a Model Trained on Encrypted Vector Embeddings
To use the newly built model in production, the data being used to create an inference needs to first go through the exact same process: it must be reduced to embeddings that are then encrypted using the same key that was used on the training data for the model. Without access to the correct key, the model is useless.
Reduced Risk of Model Theft
In most infrastructures – particularly production ones – encryption keys are the most secure part of the infrastructure. They should live inside key management systems that are tightly controlled, locked down, and backed by hardware security modules. An intruder cannot use the model unless they also compromise the key, which means the risk of a valuable model being stolen in a breach goes way down.
Protection Against Insiders
One difficulty in this pattern is that the data scientists need to be able to test the model they create. A simple and secure way to do this is to divide the training data into a training data set and a test data set used to measure the accuracy of the learned model. Note: additional information associated with each bit of training data would not be encrypted, so in our example, the vectors would be encrypted, but the ratings for the customer support interactions the vectors describe would be unencrypted. When done in this way, the machine learning team can build and test models without ever seeing any real data, and the key for the model can be available solely to the production environment.
If the engineers need to be able to test more organically by running the model in a test environment with ad hoc inputs, then we run into an issue. The model is useless without the key, which means the key needs to be available to the test environment for more qualitative testing.
Security Considerations
The best attack against approximate-distance-comparison-preserving encryption is what’s known as a chosen plaintext attack. This is where the attacker is able to generate a list of associations between original input (text, in our current example) and encrypted output for a particular key (meaning the attacker can make use of the key even if they can’t view it). With a large enough sample of these associations, an attacker can build a model that roughly inverts vectors that were encrypted under the same key.
The attack isn’t perfect as the quality of the results is impacted by randomness that’s injected into the encryption process and is bounded by an “approximation factor.” This means that specific details, like names, are difficult to accurately recover. The higher the approximation factor, the less effective the inversion model will be. But this is a topic we cover in depth elsewhere.
The key takeaway here is that even if it must be shared with a test environment, the key used to encrypt the training data should be tightly controlled, and engineers should not be given direct access to it. Instead, it should only be usable by code running in that environment, and there should be review processes in place where multiple team members know what code is being deployed into the environment. This greatly minimizes the risk that someone could create such an attack model.
As with most real-world security, what we’re proposing here is a mixture of technical, procedural, and policy measures that together provide a robustly secure and privacy-preserving system. For more sensitive use cases, an organization may choose not to allow ad hoc testing outside of production, where the strictest access controls and policies are enforced.
Accuracy Considerations
Encrypting vectors this way introduces some randomness into the vectors, which can have a negative impact on the accuracy of the model. For classification models with distinct categories, this generally isn’t a problem, but results will depend on specific use cases as well as the chosen value for the previously mentioned approximation factor. Testing for specific use cases is recommended.
Other Approaches
There are other ways to encrypt data in models that also require a specific key to use the model, most notably the family of fully homomorphic encryption (FHE) techniques. This is cool technology and another viable approach for many use cases. The trade-off versus the approach outlined above is one of performance (meaning here the amount of time it takes to train a model or generate an inference). FHE requires quite a lot more processing power, introduces latency, and is not feasible for large models. It also requires special ways to run and use the model.
The advantages to the approach mentioned in this article, which is a partially homomorphic encryption technique, are these:
- The training data contents (and by implication, the data embedded into the model) are encrypted, not just the model itself
- Models can be run anywhere they would be without encryption
- The performance is strong, with the overhead being mostly around generating embeddings. The encryption takes negligible time in comparison
- It works on arbitrarily large models
Conclusion
Machine learning for organizations that want to leverage sensitive data using AI doesn’t have to compromise the security of the data. One-way encrypting vector embeddings and using them as training data is a viable approach to preserve privacy and increase security.
For those interested in this sort of encryption, take a look at IronCore’s Cloaked AI, which is open source under the AGPL license or available under a paid commercial license.