CollationKeyFilter
converts each token into its binary CollationKey using the
provided Collator, and then encode the CollationKey
as a String using
IndexableBinaryStringTools, to allow it to be
stored as an index term.

Normalize token text with ICU's Normalizer2
With this filter, you can normalize text in the following ways:
NFKC Normalization, Case Folding, and removing Ignorables (the default)
Using a standard Normalization mode (NFC, NFD, NFKC, NFKD)
Based on rules from a custom normalization mapping.

Breaks text into words according to UAX #29: Unicode Text Segmentation
(http://www.unicode.org/reports/tr29/)
Words are broken across script boundaries, then segmented according to
the BreakIterator and typing provided by the ICUTokenizerConfig

Will be removed in Lucene 4.0. This filter is unmaintained and might not behave
correctly if used with custom Attributes, i.e. Attributes other than
the ones located in org.apache.lucene.analysis.tokenattributes. It also uses
hardcoded payload encoders which makes it not easily adaptable to other use-cases.

This class implements Word Break rules from the Unicode Text Segmentation
algorithm, as specified in
Unicode Standard Annex #29
URLs and email addresses are also tokenized according to the relevant RFCs.