N-gram indexing is a powerful method for getting fast, “search as you type”
functionality like iTunes. It is also useful for quick and effective indexing
of languages such as Chinese and Japanese without word breaks.

N-grams refers to groups of N characters... bigrams are groups of two
characters, trigrams are groups of three characters, and so on.

Whoosh includes two methods for analyzing N-gram fields: an N-gram tokenizer,
and a filter that breaks tokens into N-grams.

whoosh.analysis.NgramTokenizer tokenizes the entire field into N-grams.
This is more useful for Chinese/Japanese/Korean languages, where it’s useful
to index bigrams of characters rather than individual characters. Using this
tokenizer with roman languages leads to spaces in the tokens.

Whoosh includes two pre-configured field types for N-grams:
whoosh.fields.NGRAM and whoosh.fields.NGRAMWORDS. The only
difference is that NGRAM runs all text through the N-gram filter, including
whitespace and punctuation, while NGRAMWORDS extracts words from the text
using a tokenizer, then runs each word through the N-gram filter.