Docs
Cloaked Search
Usage
Configuration
Overview

Configuration of Cloaked Search

The configuration of Cloaked Search is done through a series of files. There is one file that is the configuration for Cloaked Search and a set of files that configure indices and index groups.

Docker configuration

Cloaked Search’s main configuration must be mounted into the docker container at /app/deploy.json. This configuration has the following form:

JSON

{
  "search_service_url": "http://search-service:9200",
  "standalone_keys": [
	{ "id": "key", "keypath": "/location/of/file.key", "primary": true }
  ],
  "global_settings": {
	"analysis": {
	  "analyzer": {
		"lowercase_shingle_analyzer": {
		  "type": "custom",
		  "tokenizer": "standard",
		  "filter": ["lowercase", "longer_shingle"]
		}
	  },
	  "filter": {
		"longer_shingle": {
		  "type": "shingle",
		  "min_shingle_size": 2,
		  "max_shingle_size": 3,
		},
		"my_substring": {
		  "type": "ngram",
		  "min_gram": "3",
		  "max_gram": "3"
		}
	  }
	}
  }
}

Search Service URL

search_service_url must be set to the URL for the Elasticsearch or OpenSearch service that you want to proxy. It must be fully specified as a URL that includes the port number.

JSON

"search_service_url": "http://search-service:9200",

Keys

There are two ways for Cloaked Search to generate the keys used to derive search hashes and KEKs (Key Encryption Keys) for encrypting the per-document keys:

Standalone Keys
Integrating with the Tenant Security Proxy (TSP)

The desired method is chosen by having either standalone_keys or tsp_config configurations at the top level of the config file.

Standalone Keys

standalone_keys defines the master keys for the Cloaked Search installation. Cloaked Search uses a key generation algorithm that produces a different encryption key for each tenant as well as a different search hash for each tenant/index/field combination. As such, the master key is not used directly as a cryptographic key, but it should still have high entropy.

Support for rotating this key is planned but is not available yet. Currently, standalone_keys should be an array with a single entry. The object in the array should have id, keypath and primary fields, and primary must be set to true.

The actual key value should be between 32 and 64 characters, encoded as a hexadecimal string, and it should be in a file that is separate from the configuration. You will likely create a secret containing the key value in your container environment and mount that secret as the file in the container. For example,

Console

$ cat /location/of/file.key
31e1cd1baebe933b4b7947bd8bd37fc25ff2fa050477ae27cb13488ef7a87da8

JSON

"standalone_keys": [
  { "id": "key", "keypath": "/location/of/file.key", "primary": true }
]

Tenant Security Proxy

Cloaked Search can use the Tenant Security Proxy (the TSP) to generate keys. The TSP uses each tenant’s configured cloud KMS instance to wrap the generated keys for that tenant. Cloaked Search requires the url of the TSP as well as an api_keypath that points to the file containing the TSP’s API key. If api_keypath is not present, it will default to /secrets/cloaked-search/tsp-api-key. You will likely create a secret containing the API key and mount that secret as this file in the Cloaked Search container.

Console

$ cat /location/of/api-key/file
dGhpc2lzQmFzZTY0

JSON

"tsp_config": {
	"url": "http://tsp-service:7777",
	"api_keypath": "/location/of/api-key/file"
}

Global settings

Global settings is a section that has a single entry, analysis. This entry will allow the specification of analyzers and filters that can be used by name in any index configuration file.

JSON

"global_settings": {
  "analysis": {
	"analyzer": {
	  "lowercase_shingle_analyzer": {
		"type": "custom",
		"tokenizer": "standard",
		"filter": ["lowercase", "longer_shingle"]
	  }
	},
	"filter": {
	  "longer_shingle": {
		"type": "shingle",
		"min_shingle_size": 2,
		"max_shingle_size": 3,
	  }
	}
  }
}

Index configuration

A folder mounted at /app/indices contains files to configure the indices/index groups that will be protected by Cloaked Search.

JSON

{
  "use_compact_search_key_id": true,
  "tenant_id_index_field": "tenant_id",
  "tenant_id_search_field": "tenant_id.keyword",
  "indices": ["customers", "users", "organizations-*"],
  "mappings": {
    "_encrypted_source": { "enabled": true },
	"properties": {
	  "user_name": { "type": "text" },
	  "organization_name": { "type": "text" },
	  "name": {
		"type": "text",
		"index_prefixes": {},
		"fields": {
		  "shingle": {
			"type": "text",
			"analyzer": "lowercase_shingle_analyzer"
		  }
		}
	  },
	  "manager": {
		"type": "object",
		"properties": {
		  "name": { "type": "text" },
		  "id": { "type": "keyword" }
		}
	  }
	}
  }
}

The configuration of an index vs. an index group is determined by the presence of the indices parameter in the file. If it is present, the configuration will be applied to each of the indices listed. This feature enables a single search to go across any number of those indices. You can also use suffix wildcards in the indices list. For example, user_* would apply the configuration to all indices that start with the prefix user_.

Index Identifier

id is an optional identifier for the index/index group. This defaults to the filename which has the index configuration in it. It can be overridden, but the IDs must be globally unique.

Caution: If you change the ID of an index group, you will have to re-index the modified indices.

use_compact_search_key_id

This configuration option allows the search key identifier stored with each document to be more compact than it is by default. It is recommended that this be turned on for all new indicies and will become the default in the future. Note that this requires TSP 4.9.0+ to work.

JSON

"use_compact_search_key_id": true

Tenant Identifier

tenant_id_index_field specifies the name of the field that each document added to the index must include to identify the document’s associated tenant. This field is used as a way to separate documents per tenant. It must be included in every document that is indexed.

tenant_id_search_field specifies the name of the field that will be used in each _search to find the associated tenant.

If your tenant IDs are strings, then you must set tenant_id_search_field to a keyword field. With the default dynamic field mapping, this can be done by adding .keyword to the end of your field name to search against the derived keyword sub-field. Otherwise, you can manually set your tenant ID field to type: keyword in the search service and simply configure both tenant_id_index_field and tenant_id_search_field to be your field name.

If your tenant IDs are numeric, then simply configure both tenant_id_index_field and tenant_id_search_field to be your field name.

If your index holds only one tenant’s data, you can instead use the fixed_tenant_id config. This ID will be inferred on document index and searches, but you cannot put another tenant’s data in that index.

Indices

This optional field is a list of index names to which to apply this configuration. By default, uses the configuration file name as the index name.

Mappings

Mappings define which fields Cloaked Search should protect in documents added to this index/index group, and how it should index and search those fields. It generally follows the structure of Elasticsearch field mapping definitions. Supported field types are text, keyword, and object. If you have a boolean or number field you’d like to protect, include it as type text and we’ll interpret it as text.

JSON

"mappings": {
  "_encrypted_source": { "enabled": true },
  "properties": {
	"user_name": { "type": "text" },
	"organization_name": { "type": "text" }
  }
}

This mapping protects the fields user_name and organization_name, using the default text analyzer on those fields.

If a more complex analyzer is desired in combination with the text type it can be defined using the analyzer property. This allows the specification of custom analyzers. These analyzers can be defined inline or can reference globally defined analyzers.

JSON

"mappings": {
  "properties": {
	"name": {
	  "type": "text",
	  "analyzer": "lowercase_shingle_analyzer"
	}
  }
}

This specifies that name should be protected and that it should be indexed using the analyzer lowercase_shingle_analyzer (which was defined in the example global Cloaked Search configuration) instead of the default text analyzer.

Starting in v2.6.0, mappings can also contain an optional _encrypted_source object. When set to { "enabled": true } (the default), an _icl_encrypted_source field will be added to documents when indexing that contains the encrypted JSON source of all the fields being protected. This encrypted source field is decrypted at query time in order to recreate the source document.

When _encrypted_source is set to { "enabled": false }, the source document will not get encrypted at index time. This means that protected fields will not be present in the hits object at query time. This option can yield significant storage and speed improvements, but Cloaked Search should only be configured this way when the protected data is being stored elsewhere and queries are only used to return document IDs or non-protected fields. This can often be combined with disabling the search service’s _source field (Elasticsearch example) for an even greater storage size reduction, but only after carefully considering the potential consequences of doing so.

Fields

Multi-fields define alternate analyzers for a single document field. The name of each field must be alphanumeric and will be appended to the parent field’s name for use in searches. Each field definition is independent and doesn’t inherit any settings from its parent. The raw parent field in the document will be the indexed content and parent_field.field will be used to search against this field.

For example, this mapping configures a body field with the default analyzer and a body.shingles field with the custom lowercase_shingle_analyzer analyzer:

JSON

"mappings": {
  "properties": {
	"body": {
	  "type": "text",
	  "fields": {
		"shingles": {
		  "type": "text",
		  "analyzer": "lowercase_shingle_analyzer"
		}
	  }
	}
  }
}

Caution: Once this configuration is set for an index and documents are added to the index, changing the configuration will require re-indexing of all documents in the index, as the protected field tokens are generated when the document is added.

If you’d like to play with Cloaked Search and see what’s happening under the hood, see our Cloaked Search in 5 minutes which shows you exactly what’s happening.

Search Service Configuration

While the search service configuration of your unprotected fields can remain the same, you must make some changes to the mappings to accomodate protected fields and encrypted source.

The changes you need to make are as follows:

Enable text mapping for all encrypted fields. Note that these field names are changed by Cloaked Search, so we recommend a wildcard rule (see example below).
Enable keyword mapping for _icl_search_key_id. This field allows for key rotation and must be a keyword indexed field.

The changes you should make are as follows:

Disable indexing of _icl_encrypted_source. This field is AES encrypted data and indexing it will not be useful.

Example:

JSON

PUT my-index-000001
{
  "mappings": {
	"dynamic_templates": [
	  {
		"protected_fields": {
		  "match_mapping_type": "string",
		  "path_match": "_icl_p_*",
		  "mapping": {
			"type": "text"
		  }
		}
	  }
	],
	"properties": {
	  "_icl_encrypted_source": {
		"enabled": false
	  },
	  "_icl_search_key_id": {
		"type": "keyword"
	  }
	}
  }
}

Also, when you indicate that a field should be protected and Cloaked Search indexes a document containing that field, it replaces the field with a field with the same name prefixed by _icl_p_. Given our earlier example of configuring index1 so that title and summary are protected, if you submit a document like

JSON

{
  "title": "First article",
  "summary": "Great story about things",
  "body": "It's a story about all the things that are great"
}

to Cloaked Search to be indexed, it will submit a document like this to your search service:

JSON

{
  "_icl_p_title": "2332309a abaca921 334451a6 ...",
  "_icl_p_summary": "7b76c95a 616544a2 b41fa81e 85933317 e30236d5 ...",
  "body": "It's a story about all the things that are great",
  "_icl_encrypted_source": "a123bb08218446fa99...",
  "_icl_search_key_id": 1
}

If your search index is configured with strict constraints on the fields in the documents, you will need to adjust your configuration accordingly. For instance, if dynamic_mapping for an index is set to false or strict, the search service will not accept the modified document unless you add the protected field names.

Getting Cloaked Search

Cloaked Search is available on our public docker repository and can be pulled using:

bash

docker pull gcr.io/ironcore-images/cloaked-search:2

In order to start it, the config files/keys must be mounted. Assuming that your current directory contains cloaked-search-conf.json, test.key, and an indices folder containing the index configurations, the following will get the Cloaked Search proxy running.

bash

docker run --init \
  --mount type=bind,src="$(pwd)"/cloaked-search-conf.json,dst=/app/deploy.yml \
  --mount type=bind,src="$(pwd)"/indices/,dst=/app/indices \
  --mount type=bind,src="$(pwd)"/test.key,dst=/test.key \
  gcr.io/ironcore-images/cloaked-search:2