Plugging in an operator precedence parser

A while back I wrote a post about writing a simple operator precedence parser. I did that with this in mind. Instead of writing, and maintaining a set of grammar rules for each type of expression involving operators, and implement them, I've plugged in an operator precedence parser instead. You may want to refer back to that post - I won't go into detail about this parser component itself since most of the inner workings is covered there, so this will be another short part and will rely heavily on you having read that post.

To illustrate why I prefer an operator precedence parser, lets look at a small grammar snippet that gives an idea of how it would be implemented in a recursive descent parser.

Now this looks simple, and it is: The looser an operator binds, the closer to the "top" of the expression it is, so the parser will look for tighter binding operators first, by following the rules down.

But it gets tedious, especially when you convert it to code by hand. More importantly you need to add this to the grammar for each and every priority level in the grammar. With an operator precedence parser, you instead get a table of priorities - changing how the operator binds is just a matter of changing that table. Lets look at the table for my parser as it currently stands:

Notice how "," is an operator - this is to handle function calls without any special handling in the shunting yard class (instead handling it by flattening the tree in the TreeOutput class) - adding special handling and avoiding any special checks in the TreeOutput class would also work. I didn't do any real analysis to find out which is best.

If you wanted to, you could mechanically convert this table to grammar fragments like above fairly easily, and automatically generating the parser code wouldn't be hard either. But there's not really much point - as my earlier post showed, implementing the Shunting Yard algorithm for an operator precedence parser is fairly simple, and in this case it drops straight, and adding a new operator to the parser is more or less a matter of adding to this table.

The next big chunk that needed to be added was a tokenizer, as the shunting yard class presented in the earlier mentioned post requires a stream of tokens rather than characters. The tokenizer needs to recognize the type of token, and then attempt to read it from the scanner. Here's the tokenizer as it stands at the moment:

Overall it's pretty simple. The one thing worth really paying attention to is how "end" and "def" are treated as terminating an expression. This is an important issue with operator precedence parser intermingled with something else: You need to take care to identify the tokens that means you've reached the end of an expression, and make sure they're not swallowed up by the tokenizer.

The main change from the shunting yard implementation in the earlier post is a slight change to TreeOutput to specifically handle the :call syntax used in the compiler:

Specifically the "elsif" arm which is added to make sure the node comes out as [:call,function name,[arg1,arg2,...]]. We could have output [function name, arg1, arg2] instead, but that has a distinct disadvantage: You could have a function name that would collide with a primitive built into the compiler, and there'd be no end of confusing bugs. Forcing it into the [:call, function name, [args]] style prevents that from occurring.

As for the s-expression parser, this parser component is tied into the main parser by instantiating it and referencing it from a member variable by adding this to Parser#initialize: "[email protected] = OpPrec::parser(s)". As an example of how the main parser is changed, consider #parse_condition:

At the end of this, the "testargs" example now looks almost like Ruby, apart from the "numargs" call, though only superficially so (remember - the language is still untyped; there are no objects or even type information at all; we'll get to a type system and object orientation soon, though we'll start with something much simpler than Ruby):

I'm a Norwegian technologist who has been living in London since 2000. I have a son, Tristan, born in 2009.

I've got development and management experience from a number of startups as well as more established companies, after co-founding my first company at age 19. I currently split my time between being the Technical Director of Nudge CRM Ltd, and running my consulting business, primarily related to development and devops. (More...)