There and Back Again: An Embedding Attack Journey

Text embeddings contain sensitive data that’s easily extractable but also easily protected

Text embeddings are used to power retrieval augmented generation (RAG) workflows, which feed relevant data to an AI model as context so it can produce more accurate, citable, grounded answers and can optionally operate on private data.

It’s increasingly common to extract information from strategy documents, communication logs like emails, medical diagnoses, etc. and use an AI model to capture the meaning as vector embeddings inside a vector database. Companies do this to make search more intelligent and less dependent on keywords, allowing for more natural language interactions.

While there are misconceptions about embeddings being hashed or unreadable and secure against hackers, we showed in our last post about facial recognition embeddings that isn’t the case. In this post I will show one very easy way people can reverse text embeddings, which often contain sensitive information, back into sentences similar to the original input. This demonstrates that even though they might not seem sensitive, vectors encode a ton of private and sensitive information, and they should be handled with care.

What are embeddings?

Embeddings are vectors of numbers created by AI models that capture meaning from text (or other types of data like photos and videos). They are usually stored inside specialty vector databases and are used with search to find embeddings that are similar or to match against a query.

For example, the phrases, “cities with the largest population sizes,” and “the world’s biggest metro areas” both carry nearly identical meanings but use entirely different words. An AI model trained to extract meaning from text will produce embeddings from these two sentences that would be very similar.

Because embeddings are stored as a series of numbers indecipherable to humans, you might think that if the embeddings leak, it’s not an issue. This couldn’t be further from the truth. Embeddings are numerical representations of data, and if the original data was sensitive, the embeddings are, too.

Attacking text embeddings

There are many different attacks on embeddings and many approaches to these attacks. Google Scholar shows 37,000 papers on embedding inversion attacks, for example. But there are also membership inference attacks, attribute inference attacks, and more.

Today I’ll walk through one specific embedding inversion attack, which is shown in the paper “Sentence Embedding Leaks More Information than You Expect: Generative Embedding Inversion Attack to Recover the Whole Sentence” by Li, Xu, and Song. The code from the paper is open source and is available on GitHub.

The attack works by creating a model that learns how to invert another model. It uses AI to attack AI, and it might sound complicated, but since the code is publicly visible on GitHub, almost anyone can do it. At some point, we expect people will share their embedding inversion models, making these attacks even faster and simpler.

Let’s look at some examples of embeddings and the sentences that produced them.

Text to embedding example using 'our earnings are down'

To the human eye, this embedding is pure nonsense. How could anyone figure out that this stream of tiny numbers means “Our earnings are down”? However, by using embedding inversion, we can get back at least the gist of the original sentence. Some researchers have even had success getting back most of the exact words including full names.

Attacking this particular embedding with this approach gets us back to the very similar text, “Our income is down”, which is more than sufficient to recover the meanings in sensitive documents, emails, and records.

Embedding back to text example becomes 'our income is down'

Preventing this attack

The best way to prevent this and many other types of attacks is to use application-layer encryption to encrypt the embedding, which can be done using Cloaked AI, an open-source library for encrypting embeddings and their associated metadata.

Encrypting vector embeddings produces vectors of the same dimensionality that look very similar but are indecipherable. It turns something that looks like [0.231, 0.899, ...] into something that looks like [-13681085.0, 42081104.0, ...]. However, if we encrypt again with the same key, we get yet another embedding that looks like [19847844.0, 60127316.0, ...].

If you’re questioning why the encrypted embedding is any better than the original, I applaud you. However, we can see the value if we try to run the inversion attack on it. The result is pure nonsense. Most of the time it doesn’t even produce words, but occassionally it’ll produce something like “hub hub hub hub hub”.

This is because the encryption renders the vector into a form that has no meaning anymore to the inversion model.

Inverting encrypted vectors produces garbage

The algorithm being used is a symmetric, property-preserving algorithm called Scale and Perturb originally created by Fuchsbauer, Ghosal, Hauke and O’Neill. And if we look at some of the details of how the algorithm works, three factors serve to make the embedding useless to anyone without the encryption key:

The encrypted embedding has a scaling factor, which is part of the key and maps the vector onto a much larger vector space. This means that each different key produces embeddings in a different range.
The encryption algorithm uses a perturbation technique, which means that each value in the embedding is “tweaked” by a seemingly random amount while preserving overall vector approximate nearness. The amount by which each value is adjusted is determined by a parameter called the approximation factor and also by a per-vector random number, which is similar to an initialization vector (IV). This IV is why the same vector encrypted twice with the same key results in different values.
The order of the elements in the encrypted embedding is shuffled based on the key, so element one in the source vector may become element 1000 in the encrypted vector, which further protects its meaning. (Note: this is not part of the original paper.)

Attacking the encrypted embedding directly with models pre-trained against the original embedding model produces nothing of use, which is to say, it produces random sentences with no bearing on the original.

Training the attack model on encrypted data

What if instead of using an attacking model specific to an embedding model, we can train a new attacker based on known text inputs and their associated encrypted vector outputs? This model would be specific to a given embedding model and encryption key, but if the attacker can somehow associate the plaintext and ciphertext (something that should be guarded against), can they then attack the encrypted embeddings in the same way?

Yes, to some extent. For a given key, the scaling and shuffle steps are always the same, so such a per-key inversion model would allow the attacker to get a bit closer, but the sentiment is still largely lost, due to the randomization in the perturbation step of the encryption. For example, something like “I live in Hawaii” comes out as “I am in Florida”.

This means that even if an attacker has access to generating encrypted embeddings from known text, they still couldn’t train an effective attacker. However, it should be noted that the effectiveness of this type of attack is related to the approximation factor. A lower approximation factor makes the chosen-plaintext attack leak more data, while a higher approximation factor leaks much less meaning.

Isn’t the encrypted embedding useless?

It’s entirely reasonable to assume that because the values inside the embedding are completely changed, it’s no longer useful. This couldn’t be further from the truth! Cloaked AI uses Approximate Distance-Preserving Encryption, which allows vector databases such as Pinecone or Weaviate to find things that are “closest” to the encrypted embedding, provided your query vector has also been encrypted with the same key. Without the secret key, there’s no valid starting point for a query. With the secret key, the vectors can be used as if there were no encryption.

Conclusion

Not only are vectors incredibly sensitive, they’re also very easy to protect. Whether storing them locally or in a hosted database, it’s a no-brainer and a best practice to protect them. The risk of these vectors and their inherent meanings leaking to the outside world goes way down with encryption, but their utility remains high. For more information, check out the Cloaked AI product page, the open source (dual-license AGPL) GitHub repo, and the docs.