Docs
Cloaked AI
Tools
VectorLens

VectorLens

VectorLens is a command line tool created to provide developers and security professionals with a straightforward path to find insecure embedding vectors in their datastores and to monitor for new potentially vulnerable data. It’s built on many of the concepts from our discussion of embedding attacks and will be updated as new techniques are introduced.

VectorLens currently supports these embedding models; if you have a need for another to support, you can request it here: [all-minilm-l6-v2, bge-m3, gtr-t5-base, jina-v5-nano-retrieval, text-embedding-3-large, text-embedding-ada-002, voyage-4-large]

Use Cases

Making the case to non-technical decision makers that the threat posed by unencrypted vector embeddings is real.
Monitoring encrypted vector datastores to detect when a team introduces new unencrypted PII.
Monitoring unencrypted vector datastores that are intended to be PII free for mistakes.
Discovering if Cloaked AI makes sense for your data.

Installation

VectorLens ships as a single self-contained binary. Pick the download that matches your machine, drop it on your PATH, and you’re ready to go.

macOS (Apple Silicon)

For Macs with an M-series chip (M1, M2, M3, M4). VectorLens uses Apple’s built-in Metal GPU on macOS — there is no CPU-only build, and Intel Macs are not supported.

Acceleration	Download
Metal (Apple GPU)	ironcore-vectorlens

Linux — ARM 64-bit

For arm64 / aarch64 Linux machines, such as AWS Graviton or Ampere Altra instances.

Acceleration	Download	Compatibility
CPU only	ironcore-vectorlens	Runs on any 64-bit ARM Linux
CUDA 12 (NVIDIA GPU)	ironcore-vectorlens	manifest.json
CUDA 13 (NVIDIA GPU)	ironcore-vectorlens	manifest.json

Linux — Intel / AMD 64-bit

For standard x86_64 Linux machines — the most common server and desktop architecture.

Acceleration	Download	Compatibility
CPU only	ironcore-vectorlens	Runs on any 64-bit Intel/AMD Linux
CUDA 12 (NVIDIA GPU)	ironcore-vectorlens	manifest.json
CUDA 13 (NVIDIA GPU)	ironcore-vectorlens	manifest.json

Choosing a CUDA build

If you have an NVIDIA GPU, a CUDA build will run faster than the CPU build. Each CUDA download has a manifest.json next to it that lists which NVIDIA driver versions, CUDA toolkit versions, and GPU compute capabilities that binary supports.

To choose the right one:

Run nvidia-smi on your machine to see your installed driver version and GPU model.
Open the manifest.json for the build you’re considering.
Confirm your driver version falls in the supported range and your GPU’s compute capability is listed.

If your GPU isn’t supported by either CUDA build, use the CPU only download — it has no driver requirements and runs on any 64-bit Linux machine.

Install

Download the binary, mark it executable, and move it onto your PATH. For example, installing the x86_64 Linux CUDA 13 build:

bash

curl -L -o ironcore-vectorlens \
  https://storage.googleapis.com/vectorlens/releases/0.2.0/x86_64-unknown-linux-gnu/cuda13/ironcore-vectorlens
chmod +x ironcore-vectorlens
sudo mv ironcore-vectorlens /usr/local/bin/

ironcore-vectorlens --version

Usage

ironcore-vectorlens scan [OPTIONS] <COMMAND>

Commands

Command	Description
`jsonl-file`	Scan a JSONL file. Expects `id` and `embedding` fields.
`parquet-file`	Scan a Parquet file. Expects `id` and `embedding` columns.
`help`	Print help for the given subcommand.

Options

`-m, --model <MODEL>`

Model to scan with. Weights are fetched automatically unless overridden with --svm-weights and --forest-weights.

Currently supported models are: [all-minilm-l6-v2, bge-m3, gtr-t5-base, jina-v5-nano-retrieval, text-embedding-3-large, text-embedding-ada-002, voyage-4-large]

`-s, --svm-weights <PATH>`

Path to custom SVM weights, overriding the default for the given model.

`-f, --forest-weights <PATH>`

Path to custom forest weights, overriding the default for the given model.

`-r, --report-path <PATH>`

If present, a shareable PDF report will be generated at the given path.

`--license-key <KEY>`

Signed license key granting access to this product. Env: LICENSE_KEY.

`-v, --verbosity <LEVEL>`

Logging level. Default: default.

silent — no human output; only the exit code and written files.
summary — print a summary at the end; no in-progress output.
default — progress while running, plus a summary and inline PII info.

`-d, --debug`

Write a JSON debug log to ./ironcore_vector_lens_debug.log.

`-h, --help`

Print help (use -h for a one-line summary).

Examples

Export data for VectorLens

Postgres with pgvector:

psql "$DATABASE_URL" -At -c \
    "SELECT row_to_json(t) FROM (SELECT id, embedding::float4[] AS embedding FROM your_table) t" \
    > embeddings.jsonl

Weaviate:

Python

import json, weaviate
from weaviate.classes.init import Auth

# Local: weaviate.connect_to_local()
# Cloud: as below
with weaviate.connect_to_weaviate_cloud(
    cluster_url=URL,
    auth_credentials=Auth.api_key(API_KEY),
) as client:
    coll = client.collections.get("MyCollection")
    with open("embeddings.jsonl", "w") as f:
        for obj in coll.iterator(include_vector=True):
            f.write(json.dumps({
                "id": str(obj.uuid),                
                "embedding": obj.vector["default"], 
            }) + "\n")

Milvus:

Python

import json
from pymilvus import MilvusClient

client = MilvusClient(uri="http://localhost:19530", token="root:Milvus")
client.load_collection("my_collection")

it = client.query_iterator(
    collection_name="my_collection",
    batch_size=1000,
    limit=-1,                              
    filter="",                             
    output_fields=["id", "embedding"],     
)
with open("embeddings.jsonl", "w") as f:
    while True:
        batch = it.next()                  
        if not batch:
            it.close()
            break
        for row in batch:
            f.write(json.dumps({"id": row["id"], "embedding": row["embedding"]}) + "\n")

Qdrant:

Python

import json
from qdrant_client import QdrantClient

client = QdrantClient(url="http://localhost:6333")
offset = None
with open("embeddings.jsonl", "w") as f:
    while True:
        points, offset = client.scroll(
            collection_name="my_coll",
            with_vectors=True,
            with_payload=False,
            limit=1000,
            offset=offset,                 
        )
        for p in points:
            # p.vector is list[float] for unnamed, dict[str, list[float]] for named.
            # SparseVector values need .model_dump() before json.dumps.
            f.write(json.dumps({"id": p.id, "embedding": p.vector}) + "\n")
        if offset is None:
            break

Pinecone:

Python

import json, os, time
from pinecone import Pinecone

pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])
index = pc.Index(host=os.environ["PINECONE_INDEX_HOST"])
NAMESPACE = "my-ns"
FETCH_BATCH = 1000

with open("embeddings.jsonl", "w") as f:
    buf = []
    for id_page in index.list(namespace=NAMESPACE):
        buf.extend(id_page)
        while len(buf) >= FETCH_BATCH:
            chunk, buf = buf[:FETCH_BATCH], buf[FETCH_BATCH:]
            resp = index.fetch(ids=chunk, namespace=NAMESPACE)
            for vid, v in resp.vectors.items():
                f.write(json.dumps({"id": vid, "embedding": list(v.values)}) + "\n")
            # stay under 100 req/s/index
            time.sleep(0.05)                                
    if buf:
        resp = index.fetch(ids=buf, namespace=NAMESPACE)
        for vid, v in resp.vectors.items():
            f.write(json.dumps({"id": vid, "embedding": list(v.values)}) + "\n")

Scan

ironcore-vectorlens scan jsonl-file --license-key XXXX --model text-embedding-ada-002 --report-path vectorlens-report.pdf --path embeddings.jsonl

This would scan the embeddings in the provided .jsonl file of ada-002 embeddings for PII and generate a PDF report showing what PII categories were detected and in which embeddings.

Licensing

VectorLens is available for trial or full production use. Submit a request for a trial license key or contact us for a production license.