This class implements UAX29URLEmailTokenizer, except with a bug
(https://issues.apache.org/jira/browse/LUCENE-3880) where "mailto:"
URI scheme prepended to an email address will disrupt recognition
of the email address.

WORD_TYPE

NUMERIC_TYPE

SOUTH_EAST_ASIAN_TYPE

public static final int SOUTH_EAST_ASIAN_TYPE

Deprecated.

Chars in class \p{Line_Break = Complex_Context} are from South East Asian
scripts (Thai, Lao, Myanmar, Khmer, etc.). Sequences of these are kept
together as as a single token rather than broken up, because the logic
required to break them at word boundaries is too complex for UAX#29.

See Unicode Line Breaking Algorithm: http://www.unicode.org/reports/tr14/#SA

yyclose

yyreset

Resets the scanner to read from a new input stream.
Does not close the old reader.
All internal variables are reset, the old input stream
cannot be reused (internal buffer is discarded and lost).
Lexical state is set to ZZ_INITIAL.
Internal scan buffer is resized down to its initial length, if it has grown.