The char_group tokenizer breaks text into terms whenever it encounters a
character which is in a defined set. It is mostly useful for cases where a simple
custom tokenization is desired, and the overhead of use of the pattern tokenizer
is not acceptable.

A list containing a list of characters to tokenize the string on. Whenever a character
from this list is encountered, a new token is started. This accepts either single
characters like eg. -, or character groups: whitespace, letter, digit,
punctuation, symbol.