The documentation of Spirit.Lex so far mostly was
about describing the features of the dynamic model,
where the tables needed for lexical analysis are generated from the regular
expressions at runtime. The big advantage of the dynamic model is its flexibility,
and its integration with the Spirit
library and the C++ host language. Its big disadvantage is the need to
spend additional runtime to generate the tables, which especially might
be a limitation for larger lexical analyzers. The static
model strives to build upon the smooth integration with Spirit
and C++, and reuses large parts of the Spirit.Lex
library as described so far, while overcoming the additional runtime requirements
by using pre-generated tables and tokenizer routines. To make the code
generation as simple as possible, the static model reuses the token definition
types developed for the dynamic model without any
changes. As will be shown in this section, building a code generator based
on an existing token definition type is a matter of writing 3 lines of
code.

Assuming you already built a dynamic lexer for your problem, there are
two more steps needed to create a static lexical analyzer using Spirit.Lex:

generating the C++ code for the static analyzer (including the tokenization
function and corresponding tables), and

But first we provide the code snippets needed to further understand the
descriptions. Both, the definition of the used token identifier and the
of the token definition class in this example are put into a separate header
file to make these available to the code generator and the static lexical
analyzer.

enumtokenids{IDANY=boost::spirit::lex::min_token_id+1,};

The important point here is, that the token definition class is not different
from a similar class to be used for a dynamic lexical analyzer. The library
has been designed in a way, that all components (dynamic lexical analyzer,
code generator, and static lexical analyzer) can reuse the very same token
definition syntax.

// This token definition class can be used without any change for all three
// possible use cases: a dynamic lexical analyzer, a code generator, and a
// static lexical analyzer.
template<typenameBaseLexer>structword_count_tokens:boost::spirit::lex::lexer<BaseLexer>{word_count_tokens():word_count_tokens::base_type(boost::spirit::lex::match_flags::match_not_dot_newline){// define tokens and associate them with the lexer
word="[^ \t\n]+";this->self=word|'\n'|boost::spirit::lex::token_def<>(".",IDANY);}boost::spirit::lex::token_def<std::string>word;};

The only thing changing between the three different use cases is the template
parameter used to instantiate a concrete token definition. For the dynamic
model and the code generator you probably will use the lex::lexertl::lexer<> template, where for the static
model you will use the lex::lexertl::static_lexer<> type as the template parameter.

This example not only shows how to build a static lexer, but it additionally
demonstrates how such a lexer can be used for parsing in conjunction with
a Spirit.Qi grammar. For completeness, we provide
the simple grammar used in this example. As you can see, this grammar does
not have any dependencies on the static lexical analyzer, and for this
reason it is not different from a grammar used either without a lexer or
using a dynamic lexical analyzer as described before.

// This is an ordinary grammar definition following the rules defined by
// Spirit.Qi. There is nothing specific about it, except it gets the token
// definition class instance passed to the constructor to allow accessing the
// embedded token_def<> instances.
template<typenameIterator>structword_count_grammar:qi::grammar<Iterator>{template<typenameTokenDef>word_count_grammar(TokenDefconst&tok):word_count_grammar::base_type(start),c(0),w(0),l(0){usingboost::phoenix::ref;usingboost::phoenix::size;// associate the defined tokens with the lexer, at the same time
// defining the actions to be executed
start=*(tok.word[++ref(w),ref(c)+=size(_1)]|lit('\n')[++ref(l),++ref(c)]|qi::token(IDANY)[++ref(c)]);}std::size_tc,w,l;// counter for characters, words, and lines
qi::rule<Iterator>start;};

The first additional step to perform in order to create a static lexical
analyzer is to create a small stand alone program for creating the lexer
tables and the corresponding tokenization function. For this purpose the
Spirit.Lex library exposes a special API - the function
generate_static_dfa()
. It implements the whole code generator, no further code is needed. All
what it takes to invoke this function is to supply a token definition instance,
an output stream to use to generate the code to, and an optional string
to be used as a suffix for the name of the generated function. All in all
just a couple lines of code.

intmain(intargc,char*argv[]){// create the lexer object instance needed to invoke the generator
word_count_tokens<lex::lexertl::lexer<>>word_count;// the token definition
// open the output file, where the generated tokenizer function will be
// written to
std::ofstreamout(argc<2?"word_count_static.hpp":argv[1]);// invoke the generator, passing the token definition, the output stream
// and the name suffix of the tables and functions to be generated
//
// The suffix "wc" used below results in a type lexertl::static_::lexer_wc
// to be generated, which needs to be passed as a template parameter to the
// lexertl::static_lexer template (see word_count_static.cpp).
returnlex::lexertl::generate_static_dfa(word_count,out,"wc")?0:-1;}

The shown code generator will generate output, which should be stored in
a file for later inclusion into the static lexical analyzer as shown in
the next topic (the full generated code can be viewed here).

Note

The generated code will have compiled in the version number of the current
Spirit.Lex library. This version number is used
at compilation time of your static lexer object to ensure this is compiled
using exactly the same version of the Spirit.Lex
library as the lexer tables have been generated with. If the versions
do not match you will see an compilation error mentioning an incompatible_static_lexer_version.

The second required step to convert an existing dynamic lexer into a static
one is to change your main program at two places. First, you need to change
the type of the used lexer (that is the template parameter used while instantiating
your token definition class). While in the dynamic model we have been using
the lex::lexertl::lexer<>
template, we now need to change that to the lex::lexertl::static_lexer<> type. The second change is tightly
related to the first one and involves correcting the corresponding #include statement to:

#include<boost/spirit/include/lex_static_lexertl.hpp>

Otherwise the main program is not different from an equivalent program
using the dynamic model. This feature makes it easy to develop the lexer
in dynamic mode and to switch to the static mode after the code has been
stabilized. The simple generator application shown above enables the integration
of the code generator into any existing build process. The following code
snippet provides the overall main function, highlighting the code to be
changed.

intmain(intargc,char*argv[]){// Define the token type to be used: 'std::string' is available as the type
// of the token value.
typedeflex::lexertl::token<charconst*,boost::mpl::vector<std::string>>token_type;// Define the lexer type to be used as the base class for our token
// definition.
//
// This is the only place where the code is different from an equivalent
// dynamic lexical analyzer. We use the `lexertl::static_lexer<>` instead of
// the `lexertl::lexer<>` as the base class for our token defintion type.
//
// As we specified the suffix "wc" while generating the static tables we
// need to pass the type lexertl::static_::lexer_wc as the second template
// parameter below (see word_count_generate.cpp).
typedeflex::lexertl::static_lexer<token_type,lex::lexertl::static_::lexer_wc>lexer_type;// Define the iterator type exposed by the lexer.
typedefword_count_tokens<lexer_type>::iterator_typeiterator_type;// Now we use the types defined above to create the lexer and grammar
// object instances needed to invoke the parsing process.
word_count_tokens<lexer_type>word_count;// Our lexer
word_count_grammar<iterator_type>g(word_count);// Our parser
// Read in the file into memory.
std::stringstr(read_from_file(1==argc?"word_count.input":argv[1]));charconst*first=str.c_str();charconst*last=&first[str.size()];// Parsing is done based on the the token stream, not the character stream.
boolr=lex::tokenize_and_parse(first,last,word_count,g);if(r){// success
std::cout<<"lines: "<<g.l<<", words: "<<g.w<<", characters: "<<g.c<<"\n";}else{std::stringrest(first,last);std::cerr<<"Parsing failed\n"<<"stopped at: \""<<rest<<"\"\n";}return0;}

Important

The generated code for the static lexer contains the token ids as they
have been assigned, either explicitly by the programmer or implicitly
during lexer construction. It is your responsibility to make sure that
all instances of a particular static lexer type use exactly the same
token ids. The constructor of the lexer object has a second (default)
parameter allowing it to designate a starting token id to be used while
assigning the ids to the token definitions. The requirement above is
fulfilled by default as long as no first_id
is specified during construction of the static lexer instances.