Splits words at punctuation characters, removing punctuation. However, a
dot that's not followed by whitespace is considered part of a token.

Splits words at hyphens, unless there's a number in the token, in which case
the whole token is interpreted as a product number and is not split.

Recognizes email addresses and internet hostnames as one token.

Many applications have specific tokenizer needs. If this tokenizer does
not suit your application, please consider copying this source code
directory to your project and maintaining your own grammar-based tokenizer.

You must specify the required Version
compatibility when creating StandardAnalyzer:

As of 2.4, Tokens incorrectly identified as acronyms
are corrected (see LUCENE-1608

getMaxTokenLength

incrementToken

Consumers (i.e., IndexWriter) use this method to advance the stream to
the next token. Implementing classes must implement this method and update
the appropriate AttributeImpls with the attributes of the next
token.

The producer must make no assumptions about the attributes after the method
has been returned: the caller may arbitrarily change it. If the producer
needs to preserve the state for subsequent calls, it can use
AttributeSource.captureState() to create a copy of the current attribute state.

To ensure that filters and consumers know which attributes are available,
the attributes must be added during instantiation. Filters and consumers
are not required to check for availability of attributes in
TokenStream.incrementToken().

end

This method is called by the consumer after the last token has been
consumed, after TokenStream.incrementToken() returned false
(using the new TokenStream API). Streams implementing the old API
should upgrade to use this feature.

This method can be used to perform any end-of-stream operations, such as
setting the final offset of a stream. The final offset of a stream might
differ from the offset of the last token eg in case one or more whitespaces
followed after the last token, but a WhitespaceTokenizer was used.

Returns the next token in the stream, or null at EOS. When possible, the
input Token should be used as the returned Token (this gives fastest
tokenization performance), but this is not required and a new Token may be
returned. Callers may re-use a single Token instance for successive calls
to this method.

This implicitly defines a "contract" between consumers (callers of this
method) and producers (implementations of this method that are the source
for tokens):

A consumer must fully consume the previously returned Token
before calling this method again.

A producer must call Token.clear() before setting the fields in
it and returning it

Also, the producer must make no assumptions about a Token after it
has been returned: the caller may arbitrarily change it. If the producer
needs to hold onto the Token for subsequent calls, it must clone()
it before storing it. Note that a TokenFilter is considered a
consumer.

reusableToken - a Token that may or may not be used to return;
this parameter should never be null (the callee is not required to
check for null before using it, but it is a good idea to assert that
it is not null.)