Offset Separator

class offset_separator

The offset_separator class is an implementation of the TokenizerFunction concept that can be used with
the tokenizer class to break text up into
tokens. The offset_separator breaks a sequence of Char's
into strings based on a sequence of offsets. For example, if you had the
string "12252001" and offsets (2,2,4) it would break the string into 12 25
2001. Here is an example.

Tells whether to wrap around to the beginning of the offsets when
the all the offsets have been used. For example the string
"1225200101012002" with offsets (2,2,4) with bwrapoffsets to true,
would parse to 12 25 2001 01 01 2002. With bwrapoffsets to false, it
would parse to 12 25 2001 and then stop because all the offsets have
been used.

breturnpartiallast

Tells whether, when the parsed sequence terminates before yielding
the number of characters in the current offset, to create a token with
what was parsed, or to ignore it. For example the string "122501" with
offsets (2,2,4) with breturnpartiallast set to true will parse to 12 25
01. With it set to false, it will parse to 12 25 and then will stop
because there are only 2 characters left in the sequence instead of the
4 that should have been there.

To use this class, pass an object of it anywhere a TokenizerFunction is
required. If you default constructruct the object, it will just return
every character in the parsed sequence as a token. (ie it defaults to an
offset of 1, and bwrapoffsets is true).