- Docs
- Cloaked Search
- Usage
- Configuration
- Overview
Configuration of Cloaked Search
The configuration of Cloaked Search is done through a series of files. There is one file that is the configuration for Cloaked Search and a set of files that configure indices and index groups.
Docker configuration
Cloaked Search’s main configuration must be mounted into the docker container at /app/deploy.json
. This configuration has the following form:
JSON{ "search_service_url": "http://search-service:9200", "standalone_keys": [ { "id": "key", "keypath": "/location/of/file.key", "primary": true } ], "global_settings": { "analysis": { "analyzer": { "lowercase_shingle_analyzer": { "type": "custom", "tokenizer": "standard", "filter": ["lowercase", "longer_shingle"] } }, "filter": { "longer_shingle": { "type": "shingle", "min_shingle_size": 2, "max_shingle_size": 3, }, "my_substring": { "type": "ngram", "min_gram": "3", "max_gram": "3" } } } } }
Search Service URL
search_service_url
must be set to the URL for the Elasticsearch or OpenSearch service that you want to proxy. It must be fully specified as a URL that includes the port number.
JSON"search_service_url": "http://search-service:9200",
Keys
There are two ways for Cloaked Search to generate the keys used to derive search hashes and KEKs (Key Encryption Keys) for encrypting the per-document keys:
- Standalone Keys
- Integrating with the Tenant Security Proxy (TSP)
The desired method is chosen by having either standalone_keys
or tsp_config
configurations at the top level of the config file.
Standalone Keys
standalone_keys
defines the master keys for the Cloaked Search installation.
Cloaked Search uses a key generation algorithm that produces a different encryption key for each tenant as well as a different search hash for each tenant/index/field combination.
As such, the master key is not used directly as a cryptographic key, but it should still have high entropy.
Support for rotating this key is planned but is not available yet. Currently, standalone_keys
should be an array with a single entry. The object in the array should have id
, keypath
and primary
fields, and primary
must be set to true
.
The actual key value should be between 32 and 64 characters, encoded as a hexadecimal string, and it should be in a file that is separate from the configuration. You will likely create a secret containing the key value in your container environment and mount that secret as the file in the container. For example,
Console$ cat /location/of/file.key 31e1cd1baebe933b4b7947bd8bd37fc25ff2fa050477ae27cb13488ef7a87da8
JSON"standalone_keys": [ { "id": "key", "keypath": "/location/of/file.key", "primary": true } ]
Tenant Security Proxy
Cloaked Search can use the Tenant Security Proxy (the TSP) to generate keys. The TSP uses each tenant’s configured cloud KMS instance to wrap the generated keys for that tenant.
Cloaked Search requires the url
of the TSP as well as an api_keypath
that points to the file containing the TSP’s API key. If api_keypath
is not present, it will default to /secrets/cloaked-search/tsp-api-key
.
You will likely create a secret containing the API key and mount that secret as this file in the Cloaked Search container.
Console$ cat /location/of/api-key/file dGhpc2lzQmFzZTY0
JSON"tsp_config": { "url": "http://tsp-service:7777", "api_keypath": "/location/of/api-key/file" }
Global settings
Global settings is a section that has a single entry, analysis
. This entry will allow the specification of analyzers and filters that can be used by name in any index configuration file.
JSON"global_settings": { "analysis": { "analyzer": { "lowercase_shingle_analyzer": { "type": "custom", "tokenizer": "standard", "filter": ["lowercase", "longer_shingle"] } }, "filter": { "longer_shingle": { "type": "shingle", "min_shingle_size": 2, "max_shingle_size": 3, } } } }
Index configuration
A folder mounted at /app/indices
contains files to configure the indices/index groups that will be protected by Cloaked Search.
JSON{ "use_compact_search_key_id": true, "tenant_id_index_field": "tenant_id", "tenant_id_search_field": "tenant_id.keyword", "indices": ["customers", "users", "organizations-*"], "mappings": { "_encrypted_source": { "enabled": true }, "properties": { "user_name": { "type": "text" }, "organization_name": { "type": "text" }, "name": { "type": "text", "index_prefixes": {}, "fields": { "shingle": { "type": "text", "analyzer": "lowercase_shingle_analyzer" } } }, "manager": { "type": "object", "properties": { "name": { "type": "text" }, "id": { "type": "keyword" } } } } } }
The configuration of an index vs. an index group is determined by the presence of the indices
parameter in the file.
If it is present, the configuration will be applied to each of the indices listed.
This feature enables a single search to go across any number of those indices.
You can also use suffix wildcards in the indices list. For example, user_*
would apply the configuration to all indices that start with the prefix user_
.
Index Identifier
id
is an optional identifier for the index/index group. This defaults to the filename which has the index configuration in it. It can be overridden, but the IDs must be globally unique.
Caution: If you change the ID of an index group, you will have to re-index the modified indices.
use_compact_search_key_id
This configuration option allows the search key identifier stored with each document to be more compact than it is by default. It is recommended that this be turned on for all new indicies and will become the default in the future. Note that this requires TSP 4.9.0+ to work.
JSON"use_compact_search_key_id": true
Tenant Identifier
tenant_id_index_field
specifies the name of the field that each document added to the index must include to identify the document’s associated tenant.
This field is used as a way to separate documents per tenant. It must be included in every document that is indexed.
tenant_id_search_field
specifies the name of the field that will be used in each _search
to find the associated tenant.
If your tenant IDs are strings, then you must set tenant_id_search_field
to a keyword field. With the default dynamic field mapping, this can be done by adding .keyword
to the end of your field name
to search against the derived keyword sub-field. Otherwise, you can manually set your tenant ID field to type: keyword
in the search service and simply configure both tenant_id_index_field
and
tenant_id_search_field
to be your field name.
If your tenant IDs are numeric, then simply configure both tenant_id_index_field
and tenant_id_search_field
to be your field name.
If your index holds only one tenant’s data, you can instead use the fixed_tenant_id
config. This ID will be inferred on document index and searches, but you cannot put another tenant’s data in that index.
Indices
This optional field is a list of index names to which to apply this configuration. By default, uses the configuration file name as the index name.
Mappings
Mappings define which fields Cloaked Search should protect in documents added to this index/index group, and how it should index and search those fields. It generally follows the structure of Elasticsearch field mapping definitions.
Supported field types are text
, keyword
, and object
. If you have a boolean
or number
field you’d like to protect, include it as type text
and we’ll interpret it as text.
JSON"mappings": { "_encrypted_source": { "enabled": true }, "properties": { "user_name": { "type": "text" }, "organization_name": { "type": "text" } } }
This mapping protects the fields user_name
and organization_name
, using the default text analyzer on those fields.
If a more complex analyzer is desired in combination with the text
type it can be defined using the analyzer
property. This allows the specification of custom analyzers. These analyzers can be defined inline or can reference globally defined analyzers.
JSON"mappings": { "properties": { "name": { "type": "text", "analyzer": "lowercase_shingle_analyzer" } } }
This specifies that name
should be protected and that it should be indexed using the analyzer lowercase_shingle_analyzer
(which was defined in the example global Cloaked Search configuration) instead of the default text analyzer.
Starting in v2.6.0, mappings
can also contain an optional _encrypted_source
object. When set to { "enabled": true }
(the default), an _icl_encrypted_source
field will be added to documents when indexing that contains the encrypted JSON source of all the fields being protected.
This encrypted source field is decrypted at query time in order to recreate the source document.
When _encrypted_source
is set to { "enabled": false }
, the source document will not get encrypted at index time. This means that protected fields will not be present in the hits object at query time.
This option can yield significant storage and speed improvements, but Cloaked Search should only be configured this way when the protected data is being stored elsewhere and queries are only used to return document IDs or non-protected fields.
This can often be combined with disabling the search service’s _source
field (Elasticsearch example) for an even greater storage size reduction, but only after carefully considering the potential consequences of doing so.
Fields
Multi-fields define alternate analyzers for a single document field. The name of each field must be alphanumeric and will be appended to the parent field’s name for use in searches. Each field definition is independent and
doesn’t inherit any settings from its parent. The raw parent field in the document will be the indexed content and parent_field.field
will be used to search against this field.
For example, this mapping configures a body
field with the default analyzer and a body.shingles
field with the custom lowercase_shingle_analyzer
analyzer:
JSON"mappings": { "properties": { "body": { "type": "text", "fields": { "shingles": { "type": "text", "analyzer": "lowercase_shingle_analyzer" } } } } }
Caution: Once this configuration is set for an index and documents are added to the index, changing the configuration will require re-indexing of all documents in the index, as the protected field tokens are generated when the document is added.
If you’d like to play with Cloaked Search and see what’s happening under the hood, see our Cloaked Search in 5 minutes which shows you exactly what’s happening.
Search Service Configuration
While the search service configuration of your unprotected fields can remain the same, you must make some changes to the mappings to accomodate protected fields and encrypted source.
The changes you need to make are as follows:
- Enable text mapping for all encrypted fields. Note that these field names are changed by Cloaked Search, so we recommend a wildcard rule (see example below).
- Enable keyword mapping for
_icl_search_key_id
. This field allows for key rotation and must be a keyword indexed field.
The changes you should make are as follows:
- Disable indexing of
_icl_encrypted_source
. This field is AES encrypted data and indexing it will not be useful.
Example:
JSONPUT my-index-000001 { "mappings": { "dynamic_templates": [ { "protected_fields": { "match_mapping_type": "string", "path_match": "_icl_p_*", "mapping": { "type": "text" } } } ], "properties": { "_icl_encrypted_source": { "enabled": false }, "_icl_search_key_id": { "type": "keyword" } } } }
Also, when you indicate that a field should be protected and Cloaked Search indexes a document containing that field,
it replaces the field with a field with the same name prefixed by _icl_p_
. Given our earlier example
of configuring index1
so that title
and summary
are protected, if you submit a document like
JSON{ "title": "First article", "summary": "Great story about things", "body": "It's a story about all the things that are great" }
to Cloaked Search to be indexed, it will submit a document like this to your search service:
JSON{ "_icl_p_title": "2332309a abaca921 334451a6 ...", "_icl_p_summary": "7b76c95a 616544a2 b41fa81e 85933317 e30236d5 ...", "body": "It's a story about all the things that are great", "_icl_encrypted_source": "a123bb08218446fa99...", "_icl_search_key_id": 1 }
If your search index is configured with strict constraints on the fields in the documents, you will need to adjust your configuration accordingly.
For instance, if dynamic_mapping
for an index is set to false
or strict
, the search service will not accept the modified document unless you add the protected field names.
Getting Cloaked Search
Cloaked Search is available on our public docker repository and can be pulled using:
bashdocker pull gcr.io/ironcore-images/cloaked-search:2
In order to start it, the config files/keys must be mounted. Assuming that your current directory contains cloaked-search-conf.json
, test.key
, and an indices
folder containing the index configurations, the following will get the Cloaked Search proxy running.
bashdocker run --init \ --mount type=bind,src="$(pwd)"/cloaked-search-conf.json,dst=/app/deploy.yml \ --mount type=bind,src="$(pwd)"/indices/,dst=/app/indices \ --mount type=bind,src="$(pwd)"/test.key,dst=/test.key \ gcr.io/ironcore-images/cloaked-search:2