Backtracking in Spirit.Qi requires the use of the following
types of iterator: forward, bidirectional, or random access. Because of backtracking,
input iterators cannot be used. Therefore, the standard library classes
std::istreambuf_iterator and std::istream_iterator,
that fall under the category of input iterators, cannot be used. Another
input iterator that is of interest is one that wraps a lexer, such as LEX.

Note

In general, Spirit.Qi generates recursive descent
parser which require backtracking parsers by design. For this reason we
need to provide at least forward iterators to any of Spirit.Qi's
API functions. This is not an absolute requirement though. In the future,
we shall see more deterministic parsers that require no more than 1 character
(token) of lookahead. Such parsers allow us to use input iterators such
as the std::istream_iterator as is.

Backtracking can be implemented only if we are allowed to save an iterator
position, i.e. making a copy of the current iterator. Unfortunately, with
an input iterator, there is no way to do so, and thus input iterators will
not work with backtracking in Spirit.Qi. One solution
to this problem is to simply load all the data to be parsed into a container,
such as a vector or deque, and then pass the begin and end of the container
to Spirit.Qi. This method can be too memory intensive
for certain applications, which is why the multi_pass
iterator was created.

The multi_pass iterator will
convert any input iterator into a forward iterator suitable for use with
Spirit.Qi. multi_pass
will buffer data when needed and will discard the buffer when its contents
is not needed anymore. This happens either if only one copy of the iterator
exists or if no backtracking can occur.

A grammar must be designed with care if the multi_pass
iterator is used. Any rule that may need to backtrack, such as one that contains
an alternative, will cause data to be buffered. The rules that are optimal
to use are repetition constructs (as kleene and plus).

Sequences of the form a>>b
will buffer data as well. This is different from the behaviour of Spirit.Classic
but for a good reason. Sequences need to reset the current iterator to its
initial state if one of the components of a seqeunce fails to match. To compensate
for this behaviour we added functionality to the expect
parsers (i.e. constructs like a>b).
Expectation points introduce deterministic points into the grammar ensuring
no backtracking can occur if they match. For this reason we clear the buffers
of any multi_pass iterator on each expectation point, ensuring minimal buffer
content even for large grammars.

Important

If you use an error handler in conjunction with the expect
parser while utilizing a multi_pass
iterator and you intend to use the error handler to force a retry or a fail
(see the description of error handlers - FIXME:
insert link), then you need to instantiate the error handler using retry or fail,
for instance:

If you fail to do so the resulting code will trigger an assert statement
at runtime.

Any rule that repeats, such as kleene_star (*a) or positive such as (+a), will only buffer the data for the current
repetition.

In typical grammars, ambiguity and therefore lookahead is often localized.
In fact, many well designed languages are fully deterministic and require
no lookahead at all. Peeking at the first character from the input will immediately
determine the alternative branch to take. Yet, even with highly ambiguous
grammars, alternatives are often of the form *(a|b|c|d).
The input iterator moves on and is never stuck at the beginning. Let's look
at a Pascal snippet for example:

Notice the alternatives inside the Kleene star in the rule block . The rule
gobbles the input in a linear manner and throws away the past history with
each iteration. As this is fully deterministic LL(1) grammar, each failed
alternative only has to peek 1 character (token). The alternative that consumes
more than 1 character (token) is definitely a winner. After which, the Kleene
star moves on to the next.

Now, after the lecture on the features to be careful with when using multi_pass, you may think that multi_pass is way too restrictive to use.
That's not the case. If your grammar is deterministic, you can make use of
the flush_multi_pass pseudo
parser in your grammar to ensure that data is not buffered when unnecessary
(flush_multi_pass is available
from the Spirit.Qi parser Repository).

Here we present a minimal example showing a minimal use case. The multi_pass iterator is highly configurable,
but the default policies have been chosen so that its easily usable with
input iterators such as std::istreambuf_iterator.
For the complete source code of this example please refer to multi_pass.cpp.

The SpiritRepository
contains the flush_multi_pass
parser component. This is usable in conjunction with the multi_pass
iterator to minimize the buffering. It allows to insert explicit synchronization
points into your grammar where it is safe to clear any stored input as it
is ensured that no backtracking can occur at this point anymore.

When the flush_multi_pass
parser is used with multi_pass,
it will call multi_pass::clear_queue().
This will cause any buffered data to be erased. This also will invalidate
all other copies of multi_pass and they should not be used. If they are,
an boost::illegal_backtracking exception will be
thrown.

The multi_pass iterator is
a templated class configurable using policies. The description of multi_pass above is how it was originally
implemented (before it used policies), and is the default configuration now.
But, multi_pass is capable
of much more. Because of the open-ended nature of policies, you can write
your own policy to make multi_pass
behave in a way that we never before imagined.

The multi_pass class has two template parameters:

The multi_pass template parameters

Input

The type multi_pass uses to acquire it's input. This is typically an
input iterator, or functor.

Policies

The combined policies to use to create an instance of a multi_pass
iterator. This combined policy type is described below

It is possible to implement all of the required functionality of the combinded
policy in a single class. But it has shown to be more convenient to split
this into four different groups of functions, i.e. four separate, but well
coordinated policies. For this reason the multi_pass
library implements a template iterator_policies::default_policy
allowing to combine several different policies, each implementing one of
the functionality groups:

Table 12. Policies needed for default_policy template

Template Parameter

Description

OwnershipPolicy

This policy determines how multi_pass
deals with it's shared components.

CheckingPolicy

This policy determines how checking for invalid iterators is done.

InputPolicy

A class that defines how multi_pass
acquires its input. The InputPolicy
is parameterized by the Input
template parameter to the multi_pass.

StoragePolicy

The buffering scheme used by multi_pass
is determined and managed by the StoragePolicy.

The multi_pass library contains
several predefined policy implementations for each of the policy types as
described above. First we will describe those predefined types. Afterwards
we will give some guidelines how you can write your own policy implementations.

All predefined multi_pass
policies are defined in the namespace boost::spirit::iterator_policies.

Table 13. Predefined policy classes

Class name

Description

InputPolicy classes

input_iterator

This policy directs multi_pass
to read from an input iterator of type Input.

buffering_input_iterator

This policy directs multi_pass
to read from an input iterator of type Input.
Aditionally it buffers the last character received from the underlying
iterator. This allows to wrap iterators not buffering the last
character on their own (as std::istreambuf_iterator).

istream

This policy directs multi_pass
to read from an input stream of type Input
(usually a std::basic_istream).

lex_input

This policy obtains it's input by calling yylex(), which would
typically be provided by a scanner generated by Flex.
If you use this policy your code must link against a Flex
generated scanner.

functor_input

This input policy obtains it's data by calling a functor of type
Input. The functor
must meet certain requirements. It must have a typedef called
result_type which
should be the type returned from operator(). Also, since an input policy
needs a way to determine when the end of input has been reached,
the functor must contain a static variable named eof which is comparable to a
variable of result_type.

split_functor_input

This is essentially the same as the functor_input
policy except that the (user supplied) funtion object exposes separate
unique and shared sub classes, allowing
to integrate the functors unique data members
with the multi_pass
data items held by each instance and its shared
data members will be integrated with the multi_pass
members shared by all copies.

OwnershipPolicy classes

ref_counted

This class uses a reference counting scheme. The multi_pass will delete it's shared
components when the count reaches zero.

first_owner

When this policy is used, the first multi_pass
created will be the one that deletes the shared data. Each copy
will not take ownership of the shared data. This works well for
Spirit, since no dynamic
allocation of iterators is done. All copies are made on the stack,
so the original iterator has the longest lifespan.

CheckingPolicy classes

no_check

This policy does no checking at all.

buf_id_check

This policy keeps around a buffer id, or a buffer age. Every time
clear_queue()
is called on a multi_pass
iterator, it is possible that all other iterators become invalid.
When clear_queue() is called, buf_id_check
increments the buffer id. When an iterator is dereferenced, this
policy checks that the buffer id of the iterator matches the shared
buffer id. This policy is most effective when used together with
the split_std_deque
StoragePolicy. It should not be used with the fixed_size_queue
StoragePolicy, because it will not detect iterator dereferences
that are out of range.

full_check

This policy has not been implemented yet. When it is, it will keep
track of all iterators and make sure that they are all valid. This
will be mostly useful for debugging purposes as it will incur significant
overhead.

StoragePolicy classes

split_std_deque

Despite its name this policy keeps all buffered data in a std::vector. All data is stored as
long as there is more than one iterator. Once the iterator count
goes down to one, and the queue is no longer needed, it is cleared,
freeing up memory. The queue can also be forcibly cleared by calling
multi_pass::clear_queue().

fixed_size_queue<N>

This policy keeps a circular buffer that is size N+1 and stores N
elements. fixed_size_queue
is a template with a std::size_t
parameter that specified the queue size. It is your responsibility
to ensure that N
is big enough for your parser. Whenever the foremost iterator is
incremented, the last character of the buffer is automatically
erased. Currently there is no way to tell if an iterator is trailing
too far behind and has become invalid. No dynamic allocation is
done by this policy during normal iterator operation, only on initial
construction. The memory usage of this StoragePolicy
is set at N+1 bytes, unlike split_std_deque,
which is unbounded.

The beauty of policy based designs is that you can mix and match policies
to create your own custom iterator by selecting the policies you want. Here's
an example of how to specify a custom multi_pass
that wraps an std::istream_iterator<char>,
and is slightly more efficient than the default multi_pass
(as generated by the make_default_multi_pass() API function) because it uses the iterator_policies::first_owner OwnershipPolicy and the iterator_policies::no_check CheckingPolicy:

There is one other pre-defined class called look_ahead.
The class look_ahead is another
predefine multi_pass iterator
type. It has two template parameters: Input,
the type of the input iterator to wrap, and a std::size_tN, which specifies the size of the buffer
to the fixed_size_queue policy.
While the default multi_pass configuration is designed for safey, look_ahead is designed for speed. look_ahead is derived from a multi_pass
with the following policies: input_iterator
InputPolicy, first_owner
OwnershipPolicy, no_check
CheckingPolicy, and fixed_size_queue<N>
StoragePolicy.

Yet another predefined iterator for wrapping standard input streams (usually
a std::basic_istream<>)
is called basic_istream_iterator<Char,Traits>. This class is usable as a drop in replacement
for std::istream_iterator<Char,Traits>.
Its only difference is that it is a forward iterator (instead of the std::istream_iterator,
which is an input iterator). basic_istream_iterator
is derived from a multi_pass with the following policies: istream
InputPolicy, ref_counted
OwnershipPolicy, no_check
CheckingPolicy, and split_std_deque
StoragePolicy.

If you want to use the functor_input
InputPolicy, you can write your own function object that will supply the
input to multi_pass. The
function object must satisfy several requirements. It must have a typedef
result_type which specifies
the return type of its operator(). This is standard practice in the STL.
Also, it must supply a static variable called eof which is compared against
to know whether the input has reached the end. Last but not least the function
object must be default constructible. Here is an example:

// define the function object
classiterate_a2m{public:typedefcharresult_type;iterate_a2m():c('A'){}iterate_a2m(charc):c_(c){}result_typeoperator()()const{if(c_=='M')returneof;returnc_++;}staticresult_typeeof;private:charc_;};iterate_a2m::result_typeiterate_a2m::eof=iterate_a2m::result_type('\0');// create two iterators using the define function object, one of which is
// an end iterator
typedefmulti_pass<iterate_a2m,iterator_policies::functor_input,iterator_policies::first_owner,iterator_policies::no_check,iterator_policies::split_std_deque>functor_multi_pass_type;functor_multi_pass_typefirst=functor_multi_pass_t(iterate_a2m());functor_multi_pass_typelast;// use the iterators: this will print "ABCDEFGHIJKL"
while(first!=last){std::cout<<*first;++first;}

All policies to be used with the default_policy
template need to have two embedded classes: unique
and shared. The unique class needs to implement all required
functions for a particular policy type. In addition it may hold all member
data items being unique for a particular instance of
a multi_pass (hence the name).
The shared class does not
expose any member functions (except sometimes a constructor), but it may
hold all member data items to be shared between all
copies of a particular multi_pass.

structinput_policy{// Input is the same type used as the first template parameter
// while instantiating the multi_pass
template<typenameInput>structunique{// these typedef's will be exposed as the multi_pass iterator
// properties
typedef__unspecified_type__value_type;typedef__unspecified_type__difference_type;typedef__unspecified_type__distance_type;typedef__unspecified_type__pointer;typedef__unspecified_type__reference;unique(){}explicitunique(Input){}// destroy is called whenever the last copy of a multi_pass is
// destructed (ownership_policy::release() returned true)
//
// mp: is a reference to the whole multi_pass instance
template<typenameMultiPass>staticvoiddestroy(MultiPass&mp);// swap is called by multi_pass::swap()
voidswap(unique&);// get_input is called whenever the next input character/token
// should be fetched.
//
// mp: is a reference to the whole multi_pass instance
//
// This method is expected to return a refernce to the next
// character/token
template<typenameMultiPass>statictypenameMultiPass::referenceget_input(MultiPass&mp);// advance_input is called whenever the underlying input stream
// should be advanced so that the next call to get_input will be
// able to return the next input character/token
//
// mp: is a reference to the whole multi_pass instance
template<typenameMultiPass>staticvoidadvance_input(MultiPass&mp);// input_at_eof is called to test whether this instance is a
// end of input iterator.
//
// mp: is a reference to the whole multi_pass instance
//
// This method is expected to return true if the end of input is
// reached. It is often used in the implementation of the function
// storage_policy::is_eof.
template<typenameMultiPass>staticboolinput_at_eof(MultiPassconst&mp);// input_is_valid is called to verify if the parameter t represents
// a valid input character/token
//
// mp: is a reference to the whole multi_pass instance
// t: is the character/token to test for validity
//
// This method is expected to return true if the parameter t
// represents a valid character/token.
template<typenameMultiPass>staticboolinput_is_valid(MultiPassconst&mp,value_typeconst&t);};// Input is the same type used as the first template parameter passed
// while instantiating the multi_pass
template<typenameInput>structshared{explicitshared(Input){}};};

It is possible to derive the struct unique
from the type boost::spirit::detail::default_input_policy. This type implements
a minimal sufficient interface for some of the required functions, simplifying
the task of writing a new input policy.

This class may implement a function destroy() being called during destruction of the
last copy of a multi_pass.
This function should be used to free any of the shared data items the policy
might have allocated during construction of its shared
part. Because of the way multi_pass
is implemented any allocated data members in shared
should not be deep copied in a copy
constructor of shared.

structownership_policy{structunique{// destroy is called whenever the last copy of a multi_pass is
// destructed (ownership_policy::release() returned true)
//
// mp: is a reference to the whole multi_pass instance
template<typenameMultiPass>staticvoiddestroy(MultiPass&mp);// swap is called by multi_pass::swap()
voidswap(unique&);// clone is called whenever a multi_pass is copied
//
// mp: is a reference to the whole multi_pass instance
template<typenameMultiPass>staticvoidclone(MultiPass&mp);// release is called whenever a multi_pass is destroyed
//
// mp: is a reference to the whole multi_pass instance
//
// The method is expected to return true if the destructed
// instance is the last copy of a particular multi_pass.
template<typenameMultiPass>staticboolrelease(MultiPass&mp);// is_unique is called to test whether this instance is the only
// existing copy of a particular multi_pass
//
// mp: is a reference to the whole multi_pass instance
//
// The method is expected to return true if this instance is unique
// (no other copies of this multi_pass exist).
template<typenameMultiPass>staticboolis_unique(MultiPassconst&mp);};structshared{};};

It is possible to derive the struct unique
from the type boost::spirit::detail::default_ownership_policy. This type implements
a minimal sufficient interface for some of the required functions, simplifying
the task of writing a new ownership policy.

This class may implement a function destroy() being called during destruction of the
last copy of a multi_pass.
This function should be used to free any of the shared data items the policy
might have allocated during construction of its shared
part. Because of the way multi_pass
is implemented any allocated data members in shared
should not be deep copied in a copy
constructor of shared.

structchecking_policy{structunique{// swap is called by multi_pass::swap()
voidswap(unique&);// destroy is called whenever the last copy of a multi_pass is
// destructed (ownership_policy::release() returned true)
//
// mp: is a reference to the whole multi_pass instance
template<typenameMultiPass>staticvoiddestroy(MultiPass&mp);// docheck is called before the multi_pass is dereferenced or
// incremented.
//
// mp: is a reference to the whole multi_pass instance
//
// This method is expected to make sure the multi_pass instance is
// still valid. If it is invalid an exception should be thrown.
template<typenameMultiPass>staticvoiddocheck(MultiPassconst&mp);// clear_queue is called whenever the function
// multi_pass::clear_queue is called on this instance
//
// mp: is a reference to the whole multi_pass instance
template<typenameMultiPass>staticvoidclear_queue(MultiPass&mp);};structshared{};};

It is possible to derive the struct unique
from the type boost::spirit::detail::default_checking_policy. This type implements
a minimal sufficient interface for some of the required functions, simplifying
the task of writing a new checking policy.

This class may implement a function destroy() being called during destruction of the
last copy of a multi_pass.
This function should be used to free any of the shared data items the policy
might have allocated during construction of its shared
part. Because of the way multi_pass
is implemented any allocated data members in shared
should not be deep copied in a copy
constructor of shared.

structstorage_policy{// Value is the same type as typename MultiPass::value_type
template<typenameValue>structunique{// destroy is called whenever the last copy of a multi_pass is
// destructed (ownership_policy::release() returned true)
//
// mp: is a reference to the whole multi_pass instance
template<typenameMultiPass>staticvoiddestroy(MultiPass&mp);// swap is called by multi_pass::swap()
voidswap(unique&);// dereference is called whenever multi_pass::operator*() is invoked
//
// mp: is a reference to the whole multi_pass instance
//
// This function is expected to return a reference to the current
// character/token.
template<typenameMultiPass>statictypenameMultiPass::referencedereference(MultiPassconst&mp);// increment is called whenever multi_pass::operator++ is invoked
//
// mp: is a reference to the whole multi_pass instance
template<typenameMultiPass>staticvoidincrement(MultiPass&mp);//
// mp: is a reference to the whole multi_pass instance
template<typenameMultiPass>staticvoidclear_queue(MultiPass&mp);// is_eof is called to test whether this instance is a end of input
// iterator.
//
// mp: is a reference to the whole multi_pass instance
//
// This method is expected to return true if the end of input is
// reached.
template<typenameMultiPass>staticboolis_eof(MultiPassconst&mp);// less_than is called whenever multi_pass::operator==() is invoked
//
// mp: is a reference to the whole multi_pass instance
// rhs: is the multi_pass reference this instance is compared
// to
//
// This function is expected to return true if the current instance
// is eual to the right hand side multi_pass instance
template<typenameMultiPass>staticboolequal_to(MultiPassconst&mp,MultiPassconst&rhs);// less_than is called whenever multi_pass::operator<() is invoked
//
// mp: is a reference to the whole multi_pass instance
// rhs: is the multi_pass reference this instance is compared
// to
//
// This function is expected to return true if the current instance
// is less than the right hand side multi_pass instance
template<typenameMultiPass>staticboolless_than(MultiPassconst&mp,MultiPassconst&rhs);};// Value is the same type as typename MultiPass::value_type
template<typenameValue>structshared{};};

It is possible to derive the struct unique
from the type boost::spirit::detail::default_storage_policy. This type implements
a minimal sufficient interface for some of the required functions, simplifying
the task of writing a new storage policy.

This class may implement a function destroy() being called during destruction of the
last copy of a multi_pass.
This function should be used to free any of the shared data items the policy
might have allocated during construction of its shared
part. Because of the way multi_pass
is implemented any allocated data members in shared
should not be deep copied in a copy
constructor of shared.

Generally, a StoragePolicy
is the trickiest policy to implement. You should study and understand the
existing StoragePolicy classes
before you try and write your own.