- Docs
- Cloaked Search
- Usage
- Configuration
- Filters - shingle
Shingle token filter
Changes token text to shingles. For example, you can use the shingle filter to change the lazy dog jumps
to [the lazy, lazy dog, dog jumps]
.
Add to an analyzer
shingle
can be added to any analyzer as a filter.
JSON"analyzer": { "shingle_analyzer": { "type": "custom", "tokenizer": "standard", "filter": ["shingle"] } }
Configurable parameters
min_shingle_size
(Optional, integer) Minimum number of tokens to concatenate when creating shingles. Defaults to2
.max_shingle_size
(Optional, integer) Maximum number of tokens to concatenate when creating shingles. Defaults to2
.output_unigrams
(Optional, Boolean) If true, the output includes the original input tokens. If false, the output only includes shingles; the original input tokens are removed. Defaults totrue
.output_unigrams_if_no_shingles
(Optional, Boolean) If true, the output includes the original input tokens only if no shingles are produced; if shingles are produced, the output only includes shingles. Defaults tofalse
.token_separator
(Optional, string) Separator used to concatenate adjacent tokens to form a shingle. Defaults to a space (" "
).
For example, a 2-3 shingle filter could be configured and used like this:
JSON"settings": { "analysis": { "analyzer": { "custom_analyzer": { "type": "custom", "filter": ["custom_shingle"] } }, "filter": { "custom_shingle": { "type": "shingle", "min_shingle_size": 2, "max_shingle_size": 3, "output_unigrams": false, "output_unigrams_if_no_shingles": true, "token_separator": " " } } } }