Configuration of Cloaked Search

The configuration of Cloaked Search is done primarily through a config file which is read by the docker container.

Docker configuration

Cloaked Search’s configuration must be mounted into the docker container at /app/deploy.yml. This configuration has the following form:
search_service_url: "https://my-search-service:9200"
standalone_keys:
    - id: <ARBITRARY_KEY_ID>
      keypath: <PATH_TO_HIGH_ENTROPY_STRING_FILE>
      primary: true
indices:
    - name: <INDEX_1>
      tenant_id_field: <FIELD_WITH_TENANT_ID>
      fields:
          - name: <FIELD_1>
            analyzer: <ANALYZER_CONFIG>
          - name: <FIELD_2>
    - name: <INDEX_2>
      tenant_id_field: <FIELD_WITH_TENANT_ID>
      fields:
          - name: <ANOTHER_FIELD_1>
          - name: <ANOTHER_FIELD_2>

Search Service URL

search_service_url should be set to the URL for the Elasticsearch or OpenSearch service that you want to proxy. It should be fully specified as a URL that includes the port number.
search_service_url: "https://my-search-service:9200"

Keys

There are two ways for Cloaked Search to generate the keys used to derive search hashes and KEKs (Key Encryption Keys) for encrypting the per-document keys:
  1. Standalone Keys
  2. Integrating with the Tenant Security Proxy (TSP)
The desired method is chosen by having either standalone_keys or tsp_config configurations at the top level of the config file.

Standalone Keys

standalone_keys contains the master keys for the Cloaked Search installation. Cloaked Search uses a key generation algorithm that produces a different encryption key for each tenant as well as a different search hash for each tenant/index/field combination. As such, the master key is not used directly as a cryptographic key, but it should still have high entropy.
Support for rotating this key is planned but is not available yet. Currently, standalone_keys should contain a keypath pointing to a single stanza, and primary must be set to true.
The actual key value should be between 32 and 64 characters, and it should be in a file that is separate from the configuration. (You will likely create a secret containing the key value in your container environment and mount that secret as the file in the container.) For example,
$ cat /location/of/key/file
31e1cd1baebe933b4b7947bd8bd37fc25ff2fa050477ae27cb13488ef7a87da8
standalone_keys:
    - id: the-one-key
      keypath: /location/of/key/file
      primary: true

Tenant Security Proxy

Cloaked Search can use the Tenant Security Proxy (the TSP) to generate keys. The TSP uses each tenant’s configured cloud KMS instance to wrap the generated keys for that tenant. Cloaked Search will require the url of the TSP as well as an api_keypath that points to the file containing the TSP’s API key. If api_keypath is not present, it will default to /secrets/cloaked-search/tsp-api-key. You will likely create a secret containing the API key and mount that secret as this file in the Cloaked Search container.
$ cat /location/of/api-key/file
dGhpc2lzQmFzZTY0
tsp_config:
    url: "https://tsp-service:7777"
    api_keypath: /location/of/api-key/file

Index configuration

indices lists the indices that will be supported by Cloaked Search. The configuration for each index includes the fields which will be protected.
indices:
    - name: index1
      tenant_id_field: my_special_tenant_id_field_name
      fields:
          - name: title
          - name: summary
            analyzer:
                tokenizer: default
                filters:
                    - lowercase
                    - terms
                    - substring
    - name: index2
      tenant_id_field: my_special_tenant_id_field_name
      fields:
          - name: name
          - name: ssn

Tenant Identifier

tenant_id_field specifies the name of the field that each document added to the index must include to identify the document’s associated tenant. This field is used as a way to separate documents per tenant. It must be included in every document that is indexed and specified on every document search that includes a protected field.
The value defaults to tenant_id if not specified in the configuration.
tenant_id_field: my_special_tenant_id_field_name

Filters

By default, each field will be indexed in the same way that the search service does by default. This means that the Uax29 tokenizer will be used, which should split on appropriate word boundaries, and then the lowercase filter will be applied to each of the terms.
Each field supports a number of configurations for the filters:
  • lowercase - Lowercase each of the parsed terms.
  • stopWords - Strip out stopwords from each indexed document and from queries. Currently, only a preconfigured list of English stop words are recognized, and this is not configurable.
  • phonetic - Replace each term by its phonetic equivalent when indexing a document. Do the same on search queries, replacing unquoted terms with their phonetic equivalent.
  • terms - Index each of the terms. This enables searches for quoted single words and for unquoted terms if phonetic is not enabled.
  • substring - Index all of the substrings (of three or more characters) of each of the terms. This enables prefix and suffix queries.
  • phrases - Index all word pairs. This enables quoted string searches of more than one word.

Caution

Once this configuration is set for an index and any documents are added to the index, changing the configuration will require re-indexing all documents in the index, as the protected field tokens are written when the document is put into the index.
If you'd like to play with Cloaked Search and see what’s happening under the hood, see our Cloaked Search in 5 minutes which shows you exactly what’s happening.

Search Service Configuration

You will probably need to make no changes to your search service configuration. We do however recommend that you change the settings for any protected indices so that the field _encrypted_source is not enabled. This is an internal field created by the proxy that contains the encrypted bytes of the source document, and it isn’t searchable. Disabling the field makes sure that the search service doesn’t waste space and time indexing it.
Example:
PUT my-index-000001
{
  "mappings": {
    "properties": {
        "_encrypted_source": { "enabled": false }
    }
  }
}
Also, when you indicate that a field should be protected, when Cloaked Search indexes a document containing that field, it replaces the field with that name by a field with the same name prefixed by protected_. Given our earlier example of configuring index1 so that title and summary are protected, if you submit a document like
{
    "title": "First article",
    "summary": "Great story about things",
    "body": "It's a story about all the things that are great"
}
to Cloaked Search to be indexed, it will submit a document like this to your search service:
{
    "protected_title": "2332309A ABACA921 334451A6 ...",
    "protected_summary": "7B76C95A 616544A2 B41FA81E 85933317 E30236D5 ...",
    "body": "It's a story about all the things that are great",
    "_encrypted_source": "A123BB08218446Fa99..."
}
If your search index is configured with strict constraints on the fields in the documents, you will need to adjust your configuration accordingly. For instance, if dynamic_mapping for an index is set to false or strict, the search service will not accept the modified document unless you add the protected field names.
Cloaked Search is available on our public docker repository and can be pulled using:
docker pull gcr.io/ironcore-images/cloaked-search:latest
In order to start it, the config file must be mounted. Assuming that you have cloaked-search-conf.yml in your current directory the following will get the cloaked-search proxy running.
docker run --init --mount type=bind,src="$(pwd)"/cloaked-search-conf.yml,target=/app/deploy.yml gcr.io/ironcore-images/cloaked-search:latest