Querying Cloaked Search

Cloaked Search supports the Elasticsearch and OpenSearch services.
Cloaked Search supports a strict syntax that is a subset of the search service’s query-string query.
This syntax allows you to search across both protected and non-protected fields using familiar search concepts such as exact phrase matches, substring searches and more.

Tenant ID

Cloaked Search protects each document with a per-tenant key when the document is indexed. In order to search protected fields and retrieve a protected document, the tenant that the document belongs to must be identified in the query.
If the index uses tenant_id_field, which is the default, this looks like:
GET /_search
{
  "query": {
    "query_string": {
      "query": "+tenant_id:someid (<...rest of query...>)"
    }
  }
}
The field that specifies the tenant ID is configurable in the Cloaked Search Config, but for the examples here, we will assume tenant_id is the tenant ID field.
The rules for a valid tenant ID are:
  1. The tenant_id field must be at the top-level of the query (not in a sub-query).
  2. Only one tenant_id field can be present at the top-level.
  3. The tenant_id field must be preceded by a +.
  4. Tenant IDs themselves are not restricted to any specific characters. If non-alphanumeric characters are used, the ID must be quoted.
Valid Tenant ID clauses:
+tenant_id:someid0
+tenant_id:\"an-id-with-dashes\"

Adding tenant_id to Queries

In most contexts, searches will probably yield the best results if the rest of the query is AND'd with the +tenant_id:<id> clause.
title:"War of 1812" AND title:"USS Francis" AND +tenant_id:someid0
A similar idea would be to encapsulate the rest of the query as a subquery with a + modifier.
+(title:"War of 1812" AND title:"USS Francis") +tenant_id:someid0
WARNING: Without one of these options, search results are only guaranteed to match the specified tenant and may not match any other parts of the overall search. This would result in seemingly random documents for a tenant to be returned.

Fixed Tenant Id

The tenant specification, +tenant_id:<TENANT_ID>, is only required if your configuration doesn’t have a fixed tenant ID for the index being searched. If the ID is fixed, the +tenant_id clauses below can be omitted.

Fields

Data stored in protected fields must be identified by field when querying. You can search over a protected field by typing the field’s name followed by a colon and then the search term. If you want to specify multiple terms, you will need to need to use a subquery and boolean operators.
Note: standard Elasticsearch and OpenSearch allow queries which don’t specify a field, causing the search to be over the default field(s). Because Cloaked Search applies a different transformation to terms in each protected field, you must specify the field whenever you search over protected fields. While it is still possible to search over non-protected default fields without specifying the field name, we recommend always specifying field names in the queries for clarity.

Sub-queries

Cloaked Search supports using parentheses to group clauses to form subqueries. This is especially useful for controlling the logic when using Boolean Operators.
To find documents relating to tuna fish sandwiches or tuna fish salad:
body:((tuna AND fish) AND (salad OR sandwich)) AND +tenant_id:someid

Boolean Operators

Cloaked Search supports several boolean operators that allow you to combine various terms and fields. By default, all terms in the query are optional. Using these operators can help refine searches by requiring or excluding terms from the result.
Note: Precedence rules are not the same with all of these operators. When using traditional boolean operators (OR, AND and NOT), it is recommended to use subqueries to ensure the operators are applied in the expected order.

+ (Must)

+ is the “must” operator. Putting it before a term will ensure that the term exists in the result. Terms without the + remain optional and will help increase the relevance of the result.
To search for documents about foxes that may be quick or brown, use the query:
(body:quick body:brown +body:fox) AND +tenant_id:someid

- (Must not)

- is the “must not” operator. Putting it before a term will exclude results that contain that term.
To search for documents about the Amazon rainforest, use the query:
(title:Amazon body:rainforest -body:shopping -body:Bezos) AND +tenant_id:someid

OR

The OR operator matches documents where either term exists anywhere in the given field. This is equivalent to a union of sets of documents that contain one of the terms.
The OR operator is the default operator for multiple terms. This means that if there is no operator between two terms, and an alternative default operator has not been set for the search service, the OR operator will be used automatically.
To search for documents about different forms of transportation, use the query:
(title:planes OR title:trains OR title:automobiles) AND +tenant_id:someid

AND

The AND operator matches documents where both terms exist anywhere in the given fields. This is equivalent to an intersection of sets of documents that contain one of the fields.
To search for documents about monkeys in space, use the query:
(title:space AND title:monkey) AND +tenant_id:someid

Querying and Configuration

The following sections describe query features that must be enabled both at indexing time and query time. See Cloaked Search Configuration) for how to configure an analyzer.
Some features can be enabled by more than one filter, but results may sometimes differ slightly.
Enabled by filter(s): terms, substring
Term search is simply searching for single whole words.
Example:
body:rome AND +tenant_id:someid
Enabled by filter(s): phonetic Preserve exact matches with quotes: terms or substring
Phonetic search allows words that sound the same to both be returned.
The following might return documents containing “Jerry Seinfeld” or “Elbridge Gerry".
body:Jerry AND +tenant_id:someid
Note that when phonetic is enabled, it is automatically applied to all search terms. If you want to disable phonetic search on a term, you can put quotes around a single word if either terms or substring is enabled.
To do an exact match instead of a phonetic match:
body:"Jerry" AND +tenant_id:someid
Enabled by filter(s): phrases
Phrases are groups of words surrounded by quotation marks. They are evaluated as a group and will be searched in the exact order they’re submitted.
To search for a document about Clifford the Big Red Dog, use the query:
body:"big red dog" AND +tenant_id:someid
Phrases are also necessary for terms that have different meaning when broken up. In particular, when parsing a word containing hyphens, the word will be split into multiple terms at the hyphens unless it is surrounded by quotation marks.
For example, with a default operator of OR,
body:father-in-law AND +tenant_id:someid
would be equivalent to
(body:father OR body:in OR body:law) AND +tenant_id:someid
Instead, you will likely want to use
body:"father-in-law" AND +tenant_id:someid
Enabled by filter(s): substring
Cloaked Search supports prefix and suffix queries with at least three characters specified.
The following would match documents with either “italian” or “italics” in the body field.
body:ita* AND +tenant_id:someid
Similarly, this following suffix query can be used to match document containing both “geography” and “discography”
body:*graphy AND +tenant_id:someid

Debugging a Query

You can use the following flowchart to help debug a query that is erroring.
POST to
Cloaked Search
No
Yes
Yes, but malformed
No
Yes
Yes
No
Query Not Supported with
Configured Analyzer
Input Query
Parse
Query Parses
Error to Caller
Query Contains
Tenant?
Not Protected Query
Query Contains
Protected Fields?
Process Protected Fields
with Configured Analyzer
Protected Query
Send to the search service

Unsupported

The following types of queries are not currently supported:
  • Fields that are not strings, numbers, or booleans
  • Range queries
  • Regex matches
  • Mid-word wildcards
  • Aggregations
  • Proximity requirements
  • Subdocuments
  • Synonym expansions
  • Searches across multiple tenants