Docs
Cloaked Search
Usage
Configuration
Text Analysis

Text Analysis

Text analysis enables Cloaked Search to perform full-text search, where the search returns all relevant results rather than just exact matches.

If you search for Quick fox, you probably want the document that contains A quick brown fox jumps over the lazy dog.

Analysis is done at index time on the full document and at search time on the query. In Cloaked Search, a search will always use the same analysis that was applied to that field when it was indexed. Ensure you have a complete understanding of the behavior involved or you could end up with undesired results.

Tokenization

Analysis makes full-text search possible through tokenization, which is the breaking down of words into tokens. This is commonly done along word boundaries.

The standard tokenizer is the UAX29 tokenizer. There is also a no-op keyword tokenizer that outputs the text as a single term, and a whitespace tokenizer that splits on whitespace characters.

Filters

Tokenization allows matching on terms, but each token is still matched literally. This means that a search for Quick would not match quick, because the cases are different.

To solve this problem, text analysis can normalize these tokens into a standard format. This allows you to match tokens that are not exactly the same as the search terms, but similar enough to still be relevant. For example, Quick can be lowercased: quick.

The available filters are:

Customize text analysis

Text analysis is performed by an analyzer.

By default, each text field will be indexed in the similar to the search service’s standard analyzer. This means that the UAX29 tokenizer will be used, which should split on appropriate word boundaries, and then the lowercase filter will be applied to each of the terms.

If you want to tailor your search experience, you can build a custom analyzer either in the global settings or inline.

JSON

"mappings": {
  "properties": {
    "name": {
      "type": "text",
      "analyzer": {
        "type": "custom",
        "tokenizer": "standard",
        "filter": ["lowercase", "longer_shingle"]
      }
    }
  }
}

Analyzer type must be custom. tokenizer can be chosen from the tokenizers. filter is an array containing an ordered list of the filters to run. These can be predefined filters or custom defined ones.

Derived fields

In addition to filters, you can enable prefix and phrase searching using the index_prefixes and index_phrases configuration options.