Vector database security: Qdrant

A look at the Qdrant vector database from a privacy, security, and risk management perspective

Qdrant's product

About Qdrant

Qdrant is an open source vector database that can be run either via their cloud service or as a Docker container in your own infrastructure. They have a good reputation and are one of the more interesting open source projects to follow in this space.

The Qdrant database is written in Rust, which is a wise choice both for memory safety (eliminating a major class of attacks) and performance (side note: we also write most everything we do at IronCore in Rust so maybe we’re biased here in how important this choice is, but consider that it also helps to catch bugs in code before they ship).

The database supports filtering on associated data and can work with both sparse and dense vectors using built-in hybrid search options. They have public benchmarks showing that they’re much faster than the handful of folks they chose to compare against (Pinecone is notably absent, for example, though it’s hard to make a reasonable comparison since you can’t run Pinecone on your own hardware so that makes sense).

Screenshot of Qdrant's home page on 2024-03-21

Qdrant's product page on 2024-03-21

Qdrant (in)security

An assessment of Qdrant's security

All vector embeddings and vector databases have inherent issues.

In March, 2024, we gave Qdrant a security maturity score of “weak” (you can see the spreadsheet still) based on our standardized maturity scoring. We also pointed out problems where Qdrant made misleading claims about having PCI certification and lacked most of the markers of a company who has prioritized security.

Since that time, they’ve made many strides. For example, they added a WAF, got SOC2 certified, added a bug bounty program, and documented much more. They also removed their misleading claims. Unfortunately, at last review, their self-hosted option still had insecure defaults.

Rather than attempt to stay up on their changes and do continual evaluations, we’re removing our earlier security evaluation and recommend instead that potential customers conduct their own. Note: all vector embeddings and vector databases have inherent issues because of the data they hold (see next section for more).

Vector security risks

Vector embeddings are dangerous

And independent of the specific things Qdrant is doing (or not doing) for security, they are storing the vectors, which are dangerous if not treated properly.

Vector embeddings are long arrays of numbers that represent an AI embedding model’s understanding of its input. They can represent photos, faces, fingerprints, and text. They’re used for natural-language search, for facial recognition, and much more. And they’re stored in vector databases.

Vector databases have soared in popularity. As GenAI usage has skyrocketed, vector DB usage has too, since they’re used to power a number of AI workflows, such as the popular Retrieval-Augmented Generation (RAG) (see our explanation of RAG and its security risks) used to inject private data into AI chat bots.

While it’s commonly thought that these vector embeddings are only useful for search and in AI systems, and that they don’t pose a security risk, this is a busted myth. In fact, vector embeddings can be reversed with high accuracy, as we’ve demonstrated while running open-source attacks from academia against text embeddings and face embeddings.

AI is a powerful magnet for sensitive data. The more data these AI systems have, the more useful they are. And the more risk they pose.

Vector databases can’t and won’t ever enforce the same permissions on the data as the native data stores. For example, CRM data will have custom domain-specific logic inside the CRM, but not in the vector database. This makes the vectors a gold mine of data for anyone who can gain access. They contain copies of the most sensitive data in healthcare, finance, personal activities and productivity, and business. A recent academic paper (with open source code) showed how the use of AI embeddings in email systems has enabled the first ever AI worm.

For most organizations, these vector databases (even when they’re added to existing databases) will present the biggest risk to the sensitive data in their infrastructure.

Fixing Qdrant

Using application-layer encryption to mitigate the risks from Qdrant

Today, if you’re looking for ways to comply with privacy and security laws for the data in vector databases, you will find two types of solutions on the market:

1. Redaction/tokenization of PII: a poor approach

The redaction approach is troublesome because it reduces the value of the AI systems. You can’t ask to summarize the report written by so-and-so last week if names are removed. Ditto for dates and such. It’s also troublesome because names and birthdates aren’t the only sensitive data in these systems. They’re chock full of sensitive HR data, financial data, email content, document content, photos, biometrics, and so much more. Attempts to simply redact or tokenize are half-measures, at best.

2. Property-preserving encryption: security and data privacy for everything

Encrypt all the things meme

The best way to mitigate the risks of vector embeddings is to encrypt them at the application layer. It protects the data from any direct-to-database peeking – whether by hackers or insiders at your company or your service provider. It fixes issues with data sovereignty by allowing keys to be sovereign and making subpoenas to service providers useless. And it prevents wholesale data leaks. Perhaps most importantly, if someone does gain unauthorized access to your vector database, that isn’t a breach that must be disclosed, nor an international transfer of data unless they also compromise the key, which is likely elsewhere and better protected.

Myth busted capture

But if you simply did classic encryption of the vectors, then you couldn’t search them. You’d break natural-language queries, searches over images, facial recognition matching, clustering, anomaly detection, and basically everything a vector database is good for. But this isn’t necessarily true.

Property-preserving vector encryption lets developers query a vector database if and only if they have the right key. Without the key, they can’t meaningfully query it. But with the key, everything works as if there was no encryption at all. And better yet, it works with every single vector database, not just Qdrant.

Next steps:

Doing nothing is risky. Protecting your AI data is easy. And inexpensive to boot.

Copy. Paste. Done.

Integration examples show how easy it is

The Cloaked AI docs have integration examples for a number of popular vector databases. We use their examples and then modify them to show how easy it is to add encryption, usually in just a few lines of code.

Take a look at our Qdrant examples on GitHub

Rust

let point_futures = data.into_iter().map(|(sourcetext, embedding)| {
    encrypt_to_point(ironcore.clone(), &metadata, sourcetext, embedding)
});

Qdrant encryption code snippet

Other vector database security assessments:

Pinecone
Qdrant - this page

Explainers

Recent Blogs