1. Docs

Shingle token filter

Changes token text to shingles. For example, you can use the shingle filter to change the lazy dog jumps to [the lazy, lazy dog, dog jumps].

Add to an analyzer

shingle can be added to any analyzer as a filter.

JSON
"analyzer": { "shingle_analyzer": { "type": "custom", "tokenizer": "standard", "filter": ["shingle"] } }

Configurable parameters

  • min_shingle_size (Optional, integer) Minimum number of tokens to concatenate when creating shingles. Defaults to 2.
  • max_shingle_size (Optional, integer) Maximum number of tokens to concatenate when creating shingles. Defaults to 2.
  • output_unigrams (Optional, Boolean) If true, the output includes the original input tokens. If false, the output only includes shingles; the original input tokens are removed. Defaults to true.
  • output_unigrams_if_no_shingles (Optional, Boolean) If true, the output includes the original input tokens only if no shingles are produced; if shingles are produced, the output only includes shingles. Defaults to false.
  • token_separator (Optional, string) Separator used to concatenate adjacent tokens to form a shingle. Defaults to a space (" ").

For example, a 2-3 shingle filter could be configured and used like this:

JSON
"settings": { "analysis": { "analyzer": { "custom_analyzer": { "type": "custom", "filter": ["custom_shingle"] } }, "filter": { "custom_shingle": { "type": "shingle", "min_shingle_size": 2, "max_shingle_size": 3, "output_unigrams": false, "output_unigrams_if_no_shingles": true, "token_separator": " " } } } }

Was this page helpful?