New product alert: VectorLens is now available!

VectorLens

AI embeddings create shadow copies of your data. Use VectorLens to scan vectors and discover or classify sensitive information hidden in AI pipelines.

Vector Attacks

RAG workflows use AI vectors, which hold hidden sensitive data

They look meaningless, but vector embeddings, models, and other numerical representations of data in AI can be reversed through various types of inversion attacks. Embeddings are often used in Retrieval-Augmented Generation (RAG) workflows, among others. If you use RAG, you likely have sensitive data duplicated into AI vectors.

Inversion attacks can extract personally identifiable information (PII), health diagnoses, dollar amounts, dates, and other confidential material like forward looking financial statements, strategy, HR information, and so on.

VectorLens identifies vectors that are under-monitored and under-secured so you can take the appropriate actions from permissions to encryption. Unlike other offerings in this area, you do not need to know the source data in order to determine if the vector represents anything sensitive.

Governance blind spot

Understand what's in your AI data

Privacy and AI regulations don’t care that sensitive data has been turned into numbers (and neither do hackers). If a vector can be inverted back to a name, a diagnosis, or a card number, then it’s personal data under GDPR, CCPA/CPRA, HIPAA, and the EU AI Act, which means you’re obligated to track it.

AI makes copies of data and hides them in places your existing PII scanners can’t look. It’s a massive new data and governance blind spot that VectorLens can help you uncover. Here are some of the real gaps:

  • Data mapping & records of processing (GDPR Art. 30): your RoPA and data maps are incomplete if they ignore the copies of personal data living in embeddings.

  • Access & deletion requests (DSARs / right to erasure): you can’t honor a deletion or access request for data you don’t know you’re holding in a vector store.

  • Breach scope & notification: unencrypted PII-bearing vectors expand the blast radius of a breach and the population you may be required to notify.

  • AI governance & audit: regulators and customers increasingly expect you to show what data feeds your AI systems and how it’s protected.

    VectorLens gives privacy, security, and GRC teams the evidence they need: a concrete, category-by-category accounting of the sensitive data hiding in their vectors.

Use Cases

Put VectorLens to work

Make the case to decision makers

Generate a shareable PDF that shows non-technical stakeholders exactly how much real PII is sitting unprotected in your vectors. Turn an abstract risk into a number they can act on.

Catch regressions in protected stores

Monitor encrypted or PII-free datastores to detect when a team introduces new, unencrypted vector PII before it becomes an incident.

Audit unmanaged vector databases

Discover what developers have quietly pushed into vector databases and vector-enabled databases across your infrastructure, and flag the indices that need scrutiny or protection.

Decide whether to encrypt

Quantify your exposure and determine whether Cloaked AI vector encryption makes sense for your data.

Local CLI Tool

Free* command-line tool to scan vectors and generate reports

VectorLens is a cross-platform (linux and macos) command-line tool you can use to scan vectors and generate text or PDF reports on findings. Head over to our docs site to see what models are supported or our feedback page to request support for specific embedding models.

The tool uses trained classifiers to find PII in vectors. We also have attack models for inverting, but have chosen to withhold that functionality at launch to keep this a tool that is only useful to defenders.

VectorLens works with any vector store as long as you can export the vectors for local scanning. It can be used by security teams who want to understand what data is being quietly replicated into vector-enabled databases within their organization. The tool is scriptable and can be used to apply labels back to data or to identify indices that need additional scrutiny or protection.

Run it on your own machines, keeping your data in your infrastructure at all times. IronCore Labs never sees your vectors or anything private.


* Free to try and use; fully self-serve; requires a license key, which you can obtain by filling out a form with your email address.

bash
$ ironcore-vector-lens scan -m all-minilm-l6-v2 jsonl-file --path minilm_foo_all_embeddings.jsonl --report-path foo.pdf 11:14:20 No cached lease found, fetching one from the license server 11:14:21 Found supported model 'all-minilm-l6-v2', scanning for PII with it. 11:14:28 Detected 1000 PII embeddings, sampling a few: 11:14:28 Detected ai4p-6609-0 as containing address, email, name, numeric_identifier, name, phone_number, address PII. 11:14:28 Detected ai4p-8291-0 as containing address, email, name, phone_number, address PII. 11:14:28 Detected ai4p-4903-0 as containing address, email, name, phone_number, address PII. 11:14:28 Detected ai4p-9445-0 as containing address, email, name, phone_number, address PII. 11:14:28 Detected 1000 PII embeddings, sampling a few: 11:14:28 Detected ai4p-1491-0 as containing address, email, name, numeric_identifier, name, phone_number, address PII. 11:14:28 Detected ai4p-3046-0 as containing address, email, name, phone_number, address PII. 11:14:28 Detected ai4p-6610-0 as containing address, email, name, address PII. 11:14:28 Detected ai4p-1489-0 as containing email, name, phone_number, address PII. ... 11:14:29 Scan report written to ./ironcore_pii_audit_minilm_foo_all_embeddings.json. 11:14:29 ╭────────────────────────┬───────╮ │ total_embeddings │ 35033 │ ├────────────────────────┼───────┤ │ total_pii_embeddings │ 19065 │ │ address │ 161 │ │ credit_card_number │ 12 │ │ date_of_birth │ 0 │ │ email │ 5802 │ │ name │ 10812 │ │ numeric_identifier │ 204 │ │ password │ 0 │ │ phone_number │ 850 │ │ social_security_number │ 9 │ │ unspecified │ 6909 │ │ cancelled │ false │ ╰────────────────────────┴───────╯ 11:14:29 54.42% of the scanned embeddings contained PII

Competitors Miss the Mark

IronCore VectorLens scans the actual data, not proxies of it

Other DSPM tools are text-first and text-only. Your developers have to be incorporating the scanning tech and labeling things in the process of making the vectors. That's fine, but it doesn't help you to know what data has crept into the database without oversight. It's all hope and no verify.

Competing DSPM toolsIronCore VectorLens
What it inspectsSource/original data retained near the indexThe vectors themselves
Needs access to source dataYesNo
Detects PII in orphaned/imported embeddingsNoYes
Proves inversion/reconstruction riskInferredDemonstrated

How it works

Three-step process to taking action

```mermaid flowchart TD A[1. Export] --> B[2. Scan] B --> C[3. Report] C --> D{Act} classDef red fill:#fb0100,stroke:#fff,stroke-width:2px,color:#fff; class D red; ```

1. Export

2. Scan

3. Report

Act

  1. Export: export vectors from your vector database that you want to test. See our documentation for examples covering major vector data stores.
  2. Scan: on a Linux or Mac box, call the command line tool and point to the exported vectors.
  3. Report: use the JSON report, the text output, or produce a PDF that can be emailed around.

Once you've produced a report, it will give you the information you need to make decisions going forward, such as whether or not you need to encrypt your vectors.

Vector Protection

Encrypt your AI vectors wherever they live

Check out IronCore's Cloaked AI: first-in-market and best-in-class encryption-in-use for AI vector embeddings. You can use it anywhere you store your vectors, it's simple, and it makes your vectors useless to attackers.

Learn More About Cloaked AI →

VectorLens FAQ

Can IronCore see my data?
No. VectorLens is a command-line tool that runs entirely on your own infrastructure. Your vectors and any private data never leave your machine and are never sent to IronCore Labs. The only network call is a lightweight check against our license server to validate your key and send very high level usage metrics.
Do I need the original source text to scan my vectors?
No, and this is the key difference between VectorLens and text-first DSPM tools. VectorLens reads the embeddings themselves, so it can flag sensitive vectors even when the source data is gone, was created elsewhere, or was never under your governance in the first place.
Which embedding models are supported?
At launch, VectorLens supports all-minilm-l6-v2, bge-m3, gtr-t5-base, text-embedding-ada-002, and text-embedding-3-large. We add models regularly; check the documentation for the current list, or request support for a specific model.
Which vector databases does it work with?
Any of them. VectorLens scans exported vectors in JSONL or Parquet format, so it works with any store you can export from including Pinecone, Weaviate, Milvus, Qdrant, and Postgres/pgvector. Our documentation has copy-paste export snippets for the major databases.
What kinds of PII does it detect?
VectorLens classifies vectors for categories including names, email addresses, phone numbers, physical addresses, dates of birth, credit card numbers, social security and other numeric identifiers, and passwords, plus an unspecified bucket for other sensitive content.
Does VectorLens perform inversion attacks?
No. We have built attack models that invert embeddings back to approximations of their source text, but we have deliberately withheld that functionality at launch so VectorLens stays a tool that is only useful to defenders. Let us know if you'd like to see inversions or anything else.
What does it cost?
VectorLens is free to try and use, fully self-serve. It requires a license key, which you can get by filling out a short form with your email address. No credit card required. For production and ongoing monitoring use, contact us about a production license.
What do I need to run it?
A Linux or macOS machine. On Apple Silicon it uses the built-in Metal GPU; on Linux there are CPU-only builds plus optional CUDA builds for NVIDIA GPUs to scan large datasets faster. It ships as a single self-contained binary. Download it, drop it on your PATH, and run.