Each of them provide a members begin(), end() and find() that allow to iterate over the selected segments or boundaries in the text or find a location of a segment or boundary for given iterator.

Convenience a typedefs like ssegment_index or wcboundary_point_index provided as well, where "w", "u16" and "u32" prefixes define a character type wchar_t, char16_t and char32_t and "c" and "s" prefixes define whether std::basic_string<CharType>::const_iterator or CharType const * are used.

Iterating Over Segments

Basic Iteration

It provides a bidirectional iterator that returns segment object. The segment object represents a pair of iterators that define this segment and a rule according to which it was selected. It can be automatically converted to std::basic_string object.

To perform boundary analysis, we first create an index object and then iterate over it:

Using Rules

By default segment_index's iterator return each text segment defined by two boundary points regardless the way they were selected. Thus in the example above we could see text segments like "." or " " that were selected as words.

Using a rule() member function we can specify a binary mask of rules we want to use for selection of the boundary points using word, line and sentence boundary rules.

"t|o be or ", would point to "to" - the iterator in the middle of segment "to".

"to |be or ", would point to "be" - the iterator at the beginning of the segment "be"

"to| be or ", would point to "be" - the iterator does is not point to segment with required rule so next valid segment is selected "be".

"to be or| ", would point to end as not valid segment found.

Iterating Over Boundary Points

Basic Iteration

The boundary_point_index is similar to segment_index in its interface but as a different role. Instead of returning text chunks (segments, it returns boundary_point object that represents a position in text - a base iterator used that is used for iteration of the source text C++ characters. The boundary_point object also provides a rule() member function that defines a rule this boundary was selected according to.

Note:

The beginning and the ending of the text are considered boundary points, so even an empty text consists of at least one boundary point.

There is a sentence terminator: [First sentence. |Second
sentence! Third one?]
There is a sentence separator: [First sentence. Second
|sentence! Third one?]
There is a sentence terminator: [First sentence. Second
sentence! |Third one?]
There is a sentence terminator: [First sentence. Second
sentence! Third one?|]

Locating Boundary Points

Sometimes it is useful to find a specific boundary point according to given iterator.