In the same way as the lowercase token filter is a good starting point for
many languages but falls short when exposed to the entire tower of Babel, so
the asciifolding token filter requires a more
effective Unicode character-folding counterpart for dealing with the many
languages of the world.

The icu_folding token filter (provided by the icu plug-in)
does the same job as the asciifolding filter, but extends the transformation
to scripts that are not ASCII-based, such as Greek, Hebrew, Han, conversion
of numbers in other scripts into their Latin equivalents, plus various other
numeric, symbolic, and punctuation transformations.

The icu_folding token filter applies Unicode normalization and case folding
from nfkc_cf automatically, so the icu_normalizer is not required:

The Arabic numerals ١٢٣٤٥ are folded to their Latin equivalent: 12345.

If there are particular characters that you would like to protect from
folding, you can use a
UnicodeSet
(much like a character class in regular expressions) to specify which Unicode
characters may be folded. For instance, to exclude the Swedish letters å,
ä, ö, Å, Ä, and Ö from folding, you would specify a character class
representing all Unicode characters, except for those letters: [^åäöÅÄÖ]
(^ means everything except).

The swedish analyzer first tokenizes words, then folds
each token by using the swedish_folding filter, and then
lowercases each token in case it includes some of
the uppercase excluded letters: Å, Ä, or Ö.