The regular expression may be prefixed with the usual context spec "^" for start of string,
and \< for start of word.
and suffixed with "$" for end of text and \> end of word.
Word chars are defined by the multi char escape sequence \w

This can be used for simple tokenizers.
It is recommended to use regular expressions where the empty word does not match.
Else there will appear a lot of probably useless empty tokens in the output.
All none matching chars are discarded. If the given regex contains syntax errors,
Nothing is returned

split a string into tokens (pair of labels and words) by giving a regular expression
containing labeled subexpressions.

This function should not be called with regular expressions
without any labeled subexpressions. This does not make sense, because the result list
will always be empty.

Result is the list of matching subexpressions
This can be used for simple tokenizers.
At least one char is consumed by parsing a token.
The pairs in the result list contain the matching substrings.
All none matching chars are discarded. If the given regex contains syntax errors,
Nothing is returned

The Syntax of the W3C XML Schema spec is extended by
further useful set operations, like intersection, difference, exor.
Subexpression match becomes possible with "named" pairs of parentheses.
The multi char escape sequence \a represents any Unicode char,
The multi char escape sequence \A represents any Unicode word, (\A = \a*).
All syntactically wrong inputs are mapped to the Zero expression representing the
empty set of words. Zero contains as data field a string for an error message.
So error checking after parsing becomes possible by checking against Zero (isZero predicate)