- Docs
- Cloaked AI
- Overview
Overview of IronCore Cloaked AI
Cloaked AI is a product for encrypting and decrypting vectors used in AI workflows and stored in indices or vector databases. These vectors can still be used for vector search and related operations in their encrypted forms due to the use of our property-preserving encryption algorithm that maintains the distance property of the vectors when they are encrypted.
(See product descriptions, use cases, pricing, and other information here).
What are embeddings?
Embeddings are a type of inference produced by AI models that are meant to be stored and used later. They’re sometimes called the “memory” of AI. At a low level, an embedding is merely a vector, or one-dimensional array, of real numbers with values that look something like this:
[0.123, -0.345, 0.567, -0.008, ..., -1.284]
And although we may not see meaning in those numbers, they represent important semantic information extracted from the model input. They are a summarization of some important aspect or aspects of the input. They could represent a snapshot of a face, notable information about an image, the meaning of a sentence, a snippet of code, or many other possibilities. They’re useful for facial recognition, clustering, recommendations, classifications, and various types of search. They’re also increasingly being used as part of a process that “grounds” large language models to keep them from making stuff up (hallucinating) in chats.
Why encrypt vector embeddings?
While these embeddings are not exact representations of their inputs, they are much more than just unintelligible arrays of numbers. They are impressions of data that aim to capture the key facets of their inputs. Because of this, they can hold just as much sensitive information as the original data from which they were created, and they need to be treated as such. See our information on embedding attacks for some concrete reasons why you need to consider embeddings when you are considering data privacy and security. By encrypting these vectors with Cloaked AI, you can retain their usability while protecting the information they hold.
How can you encrypt vector embeddings?
Because vector embeddings are simply arrays of numbers, you may be tempted to encrypt them using techniques you’re already familiar with. The issue is that encrypting in this way would result in a massive loss of functionality. Embeddings are often used for similarity searches (for example, does this embedding represent a face similar to one of the faces in our database?). These comparisons are made by computing distance measurements between a pair of vectors (commonly Euclidean, cosine, or dot product distance). If you encrypted the embedding with traditional methods, these distance measurements would no longer function, and similarity search would be impossible.
Cloaked AI encrypts embeddings using a distance-preserving algorithm. This means that the encryption process maintains relative distance between embeddings, and you can perform similarity searches without decrypting the data.
Don’t forget about the metadata!
If you are storing the embeddings in a vector database, there’s a decent chance that you are also storing some metadata along with each vector. This metadata might also include sensitive information. Cloaked AI provides facilities for encrypting the metadata, including standard encryption for fields that are just stored and retrieved, and deterministic encryption for fields that might be used for filtering searches. Deterministic encryption is another form of property preserving encryption that preserves the property of equality.
How do you use Cloaked AI?
The Cloaked AI functionality is included in the IronCore Labs Alloy Software Development Kit (SDK), ironcore-alloy, that you can add to your applications to provide the encryption and decryption functions you need, for the vector embeddings and any associated data that you need to protect.
The Alloy SDK include methods to encrypt the vectors and any associated metadata, plus methods to prepare a query vector so you can do nearest neighbor searches of the encrypted vectors or to prepare a query string to match metadata for filtering. It is available for use in applications written in multiple programming languages:
The Alloy SDK documentation has details.