Introduction

In my last Code Project article, An Introduction to the Boost Spirit Parser framework, I introduced some basic concepts of the boost::spirit parser framework. In that article, I created the necessary code to successfully parse the input for a simple modular arithmetic calculator. In this article, I will add in the necessary code to carry out semantic actions on the input, and therefore provide a fully functional modular arithmetic calculator.

Background

There are numerous ways to parse input from files, the command line, or elsewhere. Despite this, all these techniques have to do two basic functions. The first action to be performed is understanding the structure of the input, ensuring that it meets the specification laid down for that input. After that, we must carry out semantic actions. What are semantic actions? Semantics is the science of extracting meaning from something, so it follows that semantic actions involve carrying out actions based on the meaning of something. In the context of a parser, semantic actions are the code that gets called each time you have successfully figured out part of the input.

In the Spirit framework, the addition of semantic actions is very simple. After any part of any rule in the definition of the grammar, you can add a semantic action. This semantic action will be called whenever that particular part of the rule has been successfully matched. This process is a very powerful extension of what is possible in other parsers such as Yacc. For people familiar with Yacc, you will know that each rule in the grammar can be followed by some embedded pseudo-C code. Spirit goes one step further by allowing semantic actions to be attached to any part of the rule. An example used to process a comma separated list of signed integers is:

integers = int_p[ProcessInt()] >>
*( ',' int_p[ProcessInt()] )
;

This section of code states that the integers rule is defined as a single integer followed by zero or more additional integers, separated with commas. After each integer is processed, the framework will call the functor ProcessInt() passing in the integer that has been processed. I will deal with the details of these functors in a later section in this article.

There are many ways in which input can be processed. In the case of a simple command line calculator, it is usually possible to simply process the input as it comes in, with an evaluation stack. In fact, this concept is used by the calculators that come as demonstration projects with the Spirit library. I decided that in this case, I would demonstrate another commonly used tool associated with parsing, the conversion of input into a composite tree.

Modeling the Input Using a Composite

Many input formats can be parsed successfully by generating a composite tree out of the parser. Obvious examples such as XML or VRML parsers immediately spring to mind, but there are other file formats that can sensibly be parsed into a tree. For example: SQL scripts or mathematical expressions. Once you have the composite tree, performing a variety of actions on the tree simply becomes a problem of writing an appropriate composite visitor. A visitor can easily re-output the tree in a new file format, or output the data after it has been edited by the program. Another visitor could be used to transform the tree into a different data structure, or to evaluate the tree. The possibilities are endless.

In this article, I have used code from a previous article that I wrote entitled: Composites and Visitors - a Templatised Approach. This article provides a generic composite base class, as well as provides various schemes for visiting the entire composite tree in a particular order.

To write my modular arithmetic calculator, I have written a composite tree that has a number of different nodes. The most important of these is a root node which sits at the root of the tree and has all other nodes underneath. The reason for writing this root node concerns a particular detail of writing parsers that convert input into composite trees. Often, when you are writing a parser that generates a composite tree, you tend to create the tree starting with the leaf nodes, rather than with the root node. I tend to solve this particular problem by growing the tree below a customized root node. As each node in the tree is created, it is added to the root node. Initially, leaf nodes are added, but as composite nodes are added, they detach certain leaf nodes from underneath the root node, and add them to their own children lists. This grows the tree from the root, gradually pushing other nodes further down the hierarchy. To those of you familiar with the process of building a balanced binary tree, this process is remarkably similar. The diagram below demonstrates the process:

Following this process through, we can see that the first two steps involve adding numeric leaf nodes to the root node. The next step involves adding a plus operator which takes charge of the two children and attaches itself under the root node. The root node therefore contains a function to do this operation:

You can see that this function looks for the last two nodes under the root node, removes them from the root node, adding them to the new node, before finally adding the new node underneath the root.

The other nodes in the composite tree represent a variety of basic entities in the expression syntax, for example, integer numbers, and the various operators such as plus or minus.

I have written two visitors for the composite tree. The first is a left to right visitor that will print the expression represented by the tree. The second is a bottom to top visitor that evaluates the expression represented by the tree.

The class definition for this visitor is pretty self explanatory. As the composite is visited from the bottom of the tree to the top, results from visiting each node are placed on a stack. The operator nodes work in a very similar way to a Forth interpreter; whereby the plus node, for example, pops the top two elements off the stack, calculates the sum of them, and pushes on the results:

Adding Semantic Actions to a Spirit Parser

The preparatory work of defining composites and visitors for the expression tree have prepared us for doing the important work of adding semantic actions to our existing calculator parser. As I have already stated, the process of adding semantic actions to a Spirit parser is quite easy. It relies on using function pointers, or in a more object oriented way, functors, to callback from the parser into the parent application.

Defining the Semantic Action Functors

Functors are simply classes that implement a function call operator, operator(). This means that they can act as both classes and function pointers at the same time. The Spirit framework uses either function pointers or functors to callback from the parser into your application. I personally recommend using functors because of the increased object orientation. The functor itself must follow a set format, of which the Spirit framework supports a number. Two of these are demonstrated below:

Here, we have two operator()s, one taking a single IteratorT, and the other taking a pair of IteratorTs. If you are parsing a standard string, the Spirit framework will use the char const* specialization of these. The single parameter version will be called when you attach an action to a single character parser, for example, the ch_p parser. The two parameter version will be used under most other circumstances. The iterator parameters give you access to the input just parsed, although in this example I did not need the input.

When you are designing a functor, it is important to understand the lifetime of these functors. A new functor is not created each time a semantic action is used, instead the same functor exists for the lifetime of the grammar object. For this reason, I construct the example functor using a function pointer createPtr that can be used to create the necessary new expression tree nodes, rather than passing in a new object on construction, which would cause the system to attempt to add the same object each time the functor is called. As a general rule of thumb, you should keep these functors lightweight. The Spirit framework can pass them around by value quite a lot, so don't load them down with loads of member variables unless you want to take a hit on the speed of parsing. The functor should really be seen as a point of communication between the parser and the rest of your application.

Other signatures of functor can be used when you are parsing numeric values. If you use the built in number parsers, for example, the uint_p parser will require a functor that takes a single unsigned int. This can be seen below:

Communicating Between the Parser and the Application

The communication between the parser and the application usually involves two stages. Firstly, you need to place the functors in suitable locations within the grammar. Secondly, you need to ensure that the functors can pass any action onto the application appropriately. If you look back at the two functors defined above, both of them take a reference to a CParser which in the case of this application is the overall managing class. By storing this reference, the functor gets a lightweight way of communicating back into the application, in both of these cases to add new children into the composite tree.

To demonstrate the process of placing functors in the grammar, I will show you a section of the grammar:

Once the input has been parsed, an expression tree will exist in the m_parser object. You can visit this tree to get out the results of the expression, or to print the tree in a formatted way. For example:

This code sets the base of the modular arithmetic on the CCalculateTreeVisitor, runs the visitor on the expression tree, and gets the result from the top of the visitor stack.

Compatibility

The code produced for this article relies on both the boost::spirit library and the Loki library. Both of these libraries should be installed and set in your include path for the code to compile. Furthermore, due to the cutting edge language features that both these libraries rely on, the code will only compile on Visual C++ 7.1 or later. Also note that many simple compile errors will report in strange ways, even with the latest Microsoft compilers. For example, if you use a functor which has not yet been defined, you will quite often get a C1001, Internal Compiler Error.

Conclusions

Once you have created a parser in the Spirit Framework, it is pretty easy to plumb in semantic actions. So long as you follow the rules to keep your functors lightweight, and you remember the way in which the parser calls back to the main application through these functors, you shouldn't have too many problems. The Spirit framework is a very powerful way of creating highly object oriented parsers. The complex syntax and use of cutting edge language features are not for the fainthearted, but I think that the effort required to learn the framework will be well rewarded.

History

10 Oct 04: Version 1 released.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

Share

About the Author

I started programming on 8 bit machines as a teenager, writing my first compiled programming language before I was 16. I went on to study Engineering and Computer Science at Oxford University, getting a first and the University Prize for the best results in Computer Science. Since then I have worked in a variety of roles, involving systems management and development management on a wide variety of platforms. Now I manage a software development company producing CAD software for Windows using C++.

Outside computers, I am also the drummer in a band, The Unbelievers and we have just released our first album. I am a pretty good juggler and close up magician, and in my more insane past, I have cycled from Spain to Eastern Turkey, and cycled across the Namib desert.

Comments and Discussions

> For people familiar with Yacc, you will know that each rule> in the grammar can be followed by some embedded pseudo-C code.> Spirit goes one step further by allowing semantic actions to> be attached to any part of the rule.

Yacc, (at least Bison, anyway) has supported this for a very long time.You can put actions (surround by {}s) anywhere in the rule definition. Many people, however, only put them at the end.

But thanks for the excellent article. It provided the impetus to give Spirit a try, since a usable OO, re-entrant Bison/Flex solution continues to evade the native Windows domain.

... and i'm posting this message simply because i'm tired to be in the middle of believers.

Paradigms (and patterns) are not themself "solutions".If respecting a patten makes a solution too tricky to be undertanded ... that's the time to move to another pattern.

The reasons I like C++ rather than other language is thet it gives a full support to both multiple inhertiance and templates, letting me choose what's the most efficient way to organize a code, depending on what I'm doing.

I find terrible both to force everything to stay in the "generics" or in the classic OOP.

And I'm dreaming a language where templates can be translatable and OOP is not forced "by definition" to single inheritance and interfaces.

And now tell me about the insanity of a simple goto error;

This is a great article. But it takes no sense to evaluate it if not interested in the "boost" things.

I couldn't agree more. Design patterns are great - but they are only one of the many tools in the programmers toolbox. If you don't pick the right tool, then you won't produce something that works. The reason that C++ works for me as a language is quite simply that it does everything. If you need to get down and dirty in the memory you have the tools to do it. If you need to write high performance software then you have the tools to do it. And if you need to write highly object oriented code that will be maintainable into the future you have the tools to do it.

For me, Boost is an extension of all the things I like about C++. When I'm writing a system, and I know that memory management is going to be difficult, but that I don't need the performance benefits of completely deterministic object destruction, then rather than go to a garbage collected language like Java, I can go to boost::smart_ptr. If I want to do high performance linear algebra, I don't need to go to Fortran, I can use boost::numeric::ublas. The list goes on...

Dave Handley wrote:For me, Boost is an extension of all the things I like about C++. When I'm writing a system, and I know that memory management is going to be difficult, but that I don't need the performance benefits of completely deterministic object destruction, then rather than go to a garbage collected language like Java, I can go to boost::smart_ptr. If I want to do high performance linear algebra, I don't need to go to Fortran, I can use boost::numeric::ublas. The list goes on...

5 for this message (and I'll give you 5 for the article when I actually take time to carefully read it ). If C++ is going to remain a viable development tool in 21st century, we can't stick to old ways (raw pointers, new/delete, rule of the big three...) and Boost libraries offer powerful abstractions for small price in performances. The best thing about modern C++ is the flexibility: we can easily mix high-level abstractions with low-level C (or even assembly) code when necessary.

I think there has to be an article out there by someone concerning the future of C++. I remember first using C++ in the early 90s and thinking what a good language it was. I really liked the stream libraries and templates. It took the standards committee until the late 90s to actually standardise these things, and it has taken until now (and beyond) for compilers to actually start becoming compliant. I hope that the next stages in the development of C++ will not take quite such an iteration - especially since boost provides many of the things that could well go into an expanded standard.

My list of things that I would like to see in the standard probably includes:

1) Some advancements to templates to improve their use in libraries. The export keyword (virtually unsupported on compilers) and explicit instantiation are only a partial solution in my view. Although that largely depends on how successfully the export keyword is integrated into compilers over the next few years. I want templates to be able to exist in a separate compilation unit without massive recompilation times.

2) Something to do with general abstract base classes. The Java Object class is an example of this. I don't know exactly what I want here - but an option to make everything derive from an abstract base would be really useful - especially if that abstract base used policies to allow you to do such things as introduce optional garbage collection. I haven't really thought this one through, but the idea would be to properly introduce some of the concepts from managed C++ without losing the flexibility of C++ that is why we all love it so much

I have played around a bit with the spirit parser, and done a bit of programming with WTL. And after quite an extended acceptance period (that is a period when I simply couldn't understand how this hangs together), I'm now a firm believer in this programming technique.This article gets my 10.

Thanks - I've always found parsing to be one of those things for which there are loads of programming tools, but none of them are quite satisfactory. Spirit gets the closest in my view. It has some weaknesses, but I think these are outweighed by the advantages.

I guess that it is always a personal choice whether you like templates or not, but personally I could not envisage living without them - in fact I would go as far as saying that templates are the main feature of C++ that keep me from wanting to use other languages. I find Boost is generally very usable and straightforward. I've used loads of the Boost libraries in industrial strength code, including smart pointers, regex, ublas, operators and spirit. In every case the code that has been produced has been smaller, faster and more understandable as a result. The real killer reason though why I personally like spirit, is that I have yet to find a better alternative. Hand-coded parsers are impossible to change, debug, get working in the first place, etc. Lex and yacc are excellent, but the code they produce is unreadable, and you need to learn a whole new file format to use them - plus the format isn't even standardised so each time you use a new implementation you re-learn a load of code.

Spirit's biggest weakness is the fact that it is pushing current compilers to their limits, but in 5 years time, when error messages from template code are understandable, and compilers have improved as much as they have in the last 5 years, we probably won't even need to have this discussion.

i totally agree. i think a lot of the real benefits of templates and boost are obscure until you want (need) to accomplish certain things and realize how much work they would be using more 'traditional' methods. ... void* anyone? or how about scanf(), which is useful if you want to guarantee undefined behaviour.

i don't see why templates should 'bloat' code, unless they use loop unrolling deliberately for efficiency. the compile time evaluation of templates more typically results in compact code with redundant branches eliminated by the compiler.

a) the problem language is complex. a simple interface can at best veil the complexity, but at some layer it is still there. (You do *need* an BNF guru to get it to work).

b) The conversion isn't perfect (you'll encounter enough template issues of their own), and compiler diagnostics etc. are not customized. (But then, I rather debug my own "template-bloated" code, than the output of a code generator.)

You can't help (a), you can only isolate the end-user from a specific grammar (but that works with both the Spirit and lexx/yacc approach) . With (b), you have options.

I consider the approach of Spirit interestingat least: mapping a formal language to C++. (and, on a meta-level, it is philosophically interesting: mapping a meta-language using meta-language constructs). It closes the gap between the <>problem domain (EBNF) and the implementaiton language - the ultimate goal of a library. If my instincts don't cheat on me, Spirit is great if you regularly work with changing/different grammars.

I have "personal issues" with what you call bloated libraries, for mere practical purposes. (i.e. if it were just between me and the compiler, I could even like STL).

Oh, and boost contains a variety of libraries, some bloated, some good. And it's your choice which ones to use (which is fundamental for a library in my most arrogant opinion)

For the record: I'`m pagan. There is magic, good and bad, and I can use it - for a price.

we are here to help each other get through this thing, whatever it is Vonnegut jr.

(a) The problem language is complex. This stems from the fact that the problem is complex. If the problem of parsing was easy we would have already seen a near-perfect solution - but we haven't. In my view we are getting closer to solving the problem - and Spirit moves us a step closer.

(b) I agree that the conversion between EBNF and C++ isn't perfect with Spirit, but I also agree that it is probably easier to try and debug a problem when you have the code that generated it in front of you - rather than going through a parser generation step over which you have no control. IMHO, we will see compilers get better and better at working with templates with debugging and error reporting getting more advanced. Under those circumstances, I think Spirit, or frameworks like it, will start to become more common.

On a final personal note, I find STL and Boost to be invaluable tools for the programmer. Whether you need them every time you start to code is another matter, but they are still great tools when the time is right.