tokenizers: A Consistent Interface to Tokenize Natural Language Text

Convert natural language text into tokens. The tokenizers have a
consistent interface and are compatible with Unicode, thanks to being built
on the 'stringi' package. Includes tokenizers for shingled n-grams, skip
n-grams, words, word stems, sentences, paragraphs, characters, lines, and
regular expressions.