1. Docs
  2. Cloaked Search
  3. Usage
  4. Configuration
  5. Overview

Configuration of Cloaked Search

The configuration of Cloaked Search is done through a series of files. There is one file that is the configuration for Cloaked Search and a set of files that configure indices and index groups.

Docker configuration

Cloaked Search’s main configuration must be mounted into the docker container at /app/deploy.json. This configuration has the following form:

JSON
{ "search_service_url": "http://search-service:9200", "standalone_keys": [ { "id": "key", "keypath": "/location/of/file.key", "primary": true } ], "global_settings": { "analysis": { "analyzer": { "lowercase_shingle_analyzer": { "type": "custom", "tokenizer": "standard", "filter": ["lowercase", "longer_shingle"] } }, "filter": { "longer_shingle": { "type": "shingle", "min_shingle_size": 2, "max_shingle_size": 3, }, "my_substring": { "type": "ngram", "min_gram": "3", "max_gram": "3" } } } } }

Search Service URL

search_service_url must be set to the URL for the Elasticsearch or OpenSearch service that you want to proxy. It must be fully specified as a URL that includes the port number.

JSON
"search_service_url": "http://search-service:9200",

Keys

There are two ways for Cloaked Search to generate the keys used to derive search hashes and KEKs (Key Encryption Keys) for encrypting the per-document keys:

  1. Standalone Keys
  2. Integrating with the Tenant Security Proxy (TSP)

The desired method is chosen by having either standalone_keys or tsp_config configurations at the top level of the config file.

Standalone Keys

standalone_keys defines the master keys for the Cloaked Search installation. Cloaked Search uses a key generation algorithm that produces a different encryption key for each tenant as well as a different search hash for each tenant/index/field combination. As such, the master key is not used directly as a cryptographic key, but it should still have high entropy.

Support for rotating this key is planned but is not available yet. Currently, standalone_keys should be an array with a single entry. The object in the array should have id, keypath and primary fields, and primary must be set to true.

The actual key value should be between 32 and 64 characters, encoded as a hexadecimal string, and it should be in a file that is separate from the configuration. You will likely create a secret containing the key value in your container environment and mount that secret as the file in the container. For example,

Console
$ cat /location/of/file.key 31e1cd1baebe933b4b7947bd8bd37fc25ff2fa050477ae27cb13488ef7a87da8
JSON
"standalone_keys": [ { "id": "key", "keypath": "/location/of/file.key", "primary": true } ]

Tenant Security Proxy

Cloaked Search can use the Tenant Security Proxy (the TSP) to generate keys. The TSP uses each tenant’s configured cloud KMS instance to wrap the generated keys for that tenant. Cloaked Search requires the url of the TSP as well as an api_keypath that points to the file containing the TSP’s API key. If api_keypath is not present, it will default to /secrets/cloaked-search/tsp-api-key. You will likely create a secret containing the API key and mount that secret as this file in the Cloaked Search container.

Console
$ cat /location/of/api-key/file dGhpc2lzQmFzZTY0
JSON
"tsp_config": { "url": "http://tsp-service:7777", "api_keypath": "/location/of/api-key/file" }

Global settings

Global settings is a section that has a single entry, analysis. This entry will allow the specification of analyzers and filters that can be used by name in any index configuration file. See analyzer and filter sections to see the syntax of these items.

JSON
"global_settings": { "analysis": { "analyzer": { "lowercase_shingle_analyzer": { "type": "custom", "tokenizer": "standard", "filter": ["lowercase", "longer_shingle"] } }, "filter": { "longer_shingle": { "type": "shingle", "min_shingle_size": 2, "max_shingle_size": 3, } } } }

Index configuration

A folder mounted at /app/indices contains files to configure the indices/index groups that will be protected by Cloaked Search.

JSON
{ "tenant_id_index_field": "tenant_id", "tenant_id_search_field": "tenant_id.keyword", "indices": ["customers", "users", "organizations-*"], "mappings": { "properties": { "user_name": { "type": "text" }, "organization_name": { "type": "text" }, "name": { "type": "text", "index_prefixes": {}, "fields": { "shingle": { "type": "text", "analyzer": "lowercase_shingle_analyzer" } } }, "manager": { "type": "object", "properties": { "name": { "type": "text" }, "id": { "type": "keyword" } } } } } }

The configuration of an index vs. an index group is determined by the presence of the indices parameter in the file. If it is present, the configuration will be applied to each of the indices listed. This is just shorthand to avoid duplicating the same configuration file for each index. You can also use suffix wildcards in the indices list. search_salt is another optional parameter that can be used to allow searching across those indices. All indices in a single index group share a search salt; if not overridden, it defaults to the filename of the index group. If indices share a search salt, then field mapping properties with the same key must have the same configuration or searches across those fields will be broken.

Caution: If you change the search_salt, you will have to re-index the modified indices.

Tenant Identifier

tenant_id_index_field specifies the name of the field that each document added to the index must include to identify the document’s associated tenant. This field is used as a way to separate documents per tenant. It must be included in every document that is indexed.

tenant_id_search_field specifies the name of the field that will be used in each _search to find the associated tenant.

If your tenant IDs are strings, then you must set tenant_id_search_field to a keyword field. With the default dynamic field mapping, this can be done by adding .keyword to the end of your field name to search against the derived keyword sub-field. Otherwise, you can manually set your tenant ID field to type: keyword in the search service and simply configure both tenant_id_index_field and tenant_id_search_field to be your field name.

If your tenant IDs are numeric, then simply configure both tenant_id_index_field and tenant_id_search_field to be your field name.

If your index holds only one tenant’s data, you can instead use the fixed_tenant_id config. This ID will be inferred on document index and searches, but you cannot put another tenant’s data in that index.

Indices

This optional field is a list of index names to which to apply this configuration. By default, uses the configuration file name as the index name.

Mappings

Mappings define which fields Cloaked Search should protect in documents added to this index/index group, and how it should index and search those fields. It generally follows the structure of Elasticsearch field mapping definitions. Supported field types are text, keyword, and object. If you have a boolean or number field you’d like to protect, include it as type text and we’ll interpret it as text.

JSON
"mappings": { "properties": { "user_name": { "type": "text" }, "organization_name": { "type": "text" } } }

This mapping protects the fields user_name and organization_name, using the default text analyzer on those fields.

If a more complex analyzer is desired in combination with the text type it can be defined using the analyzer property. This allows the specification of custom analyzers. These analyzers can be defined inline or can reference globally defined analyzers.

JSON
"mappings": { "properties": { "name": { "type": "text", "analyzer": "lowercase_shingle_analyzer" } } }

This specifies that name should be protected and that it should be indexed using the analyzer lowercase_shingle_analyzer (which was defined in the example global Cloaked Search configuration) instead of the default text analyzer.

Fields

Multi-fields define alternate analyzers for a single document field. The name of each field must be alphanumeric and will be appended to the parent field’s name for use in searches. Each field definition is independent and doesn’t inherit any settings from its parent. The raw parent field in the document will be the indexed content and parent_field.field will be used to search against this field.

For example, this mapping configures a body field with the default analyzer and a body.shingles field with the custom lowercase_shingle_analyzer analyzer:

JSON
"mappings": { "properties": { "body": { "type": "text", "fields": { "shingles": { "type": "text", "analyzer": "lowercase_shingle_analyzer" } } } } }

Caution: Once this configuration is set for an index and documents are added to the index, changing the configuration will require re-indexing of all documents in the index, as the protected field tokens are generated when the document is added.

If you’d like to play with Cloaked Search and see what’s happening under the hood, see our Cloaked Search in 5 minutes which shows you exactly what’s happening.

Search Service Configuration

While you will probably not need to make changes to your search service configuration, we recommend that you change the settings for any protected indices so that the field _icl_encrypted_source is not enabled. This is an internal field created by the proxy that contains the encrypted bytes of the protected fields in the source document, and it isn’t searchable. Disabling the field makes sure that the search service doesn’t waste space and time indexing it. You can also give some hints to maximize the storage efficiency of protected fields using dynamic mappings.

Example:

JSON
PUT my-index-000001 { "mappings": { "dynamic_templates": [ { "protected_fields": { "match_mapping_type": "string", "match": "_icl_p_*", "mapping": { "type": "text" } } } ], "properties": { "_icl_encrypted_source": { "enabled": false }, "_icl_search_key_id": { "type": "keyword" } } } }

Also, when you indicate that a field should be protected and Cloaked Search indexes a document containing that field, it replaces the field with a field with the same name prefixed by _icl_p_. Given our earlier example of configuring index1 so that title and summary are protected, if you submit a document like

JSON
{ "title": "First article", "summary": "Great story about things", "body": "It's a story about all the things that are great" }

to Cloaked Search to be indexed, it will submit a document like this to your search service:

JSON
{ "_icl_p_title": "2332309a abaca921 334451a6 ...", "_icl_p_summary": "7b76c95a 616544a2 b41fa81e 85933317 e30236d5 ...", "body": "It's a story about all the things that are great", "_icl_encrypted_source": "a123bb08218446fa99...", "_icl_search_key_id": "28793103-c6df-..." }

If your search index is configured with strict constraints on the fields in the documents, you will need to adjust your configuration accordingly. For instance, if dynamic_mapping for an index is set to false or strict, the search service will not accept the modified document unless you add the protected field names.

Cloaked Search is available on our public docker repository and can be pulled using:

bash
docker pull gcr.io/ironcore-images/cloaked-search:2.0.0

In order to start it, the config files/keys must be mounted. Assuming that your current directory contains cloaked-search-conf.json, test.key, and an indices folder containing the index configurations, the following will get the Cloaked Search proxy running.

bash
docker run --init \ --mount type=bind,src="$(pwd)"/cloaked-search-conf.json,dst=/app/deploy.yml \ --mount type=bind,src="$(pwd)"/indices/,dst=/app/indices \ --mount type=bind,src="$(pwd)"/test.key,dst=/test.key \ gcr.io/ironcore-images/cloaked-search:2.0.0-RC1

Was this page helpful?