Building a Parser-Generator in Erlang, Part 5

I’ve realized I gave short-shrift to the parse_transform code that I posted yesterday. I wandered around the topic without much direction. So I’m going to revisit that code and show you the real meat of it: how I convert

Like a typical iteration routine in Erlang, it pushes things on the head of a list and then reverses at the end. When it finds a matching function, it takes the clauses of that function, transforms them, and then pushes them on the head. Note that the second clause of transform_rules will only match a single-arity function (that’s what the number 1 is doing in there). Since I’m use lists:map() in the build_rules function as you will see below, I have to reverse them before I prepend them to the accumulator.

Mapping the rules to new functions

This is pretty simple. I take the clauses, and map them into their transformation, then prepend the file and parse functions to the front. If you were evil and wanted to screw up this transform, you could provide multiple definitions (not simply clauses) of the rules function. But then it wouldn’t be very useful, would it?

Now we get to an individual rule. Since the logic is a little complicated, I broke it out into several functions. First, is build_rule.

The very first line lets us record the first rule in the grammar/parser as the root rule. This is useful later in generate_parse_function. If the row with the key root is already in the ets table, it ignores the insert. Second, we wrap the “reductions” (body of the rule) in our peg:p function as we will see below. Lastly, we return a function definition with the rule-name as the function name, an arity of 2, and a single clause that contains our wrapped reductions.

Wrap it up

Remember the whole point of this exercise was to avoid typing peg:p() over and over. Here’s where we get to that.

wrap_reductions only allows a single statement (one top-level combinator). We extract the line number out of that statement, pass the statement to another wrapper, which will return an inline-function/lambda containing the parser statement. Then we generate a similar function that will take the parsed node and transform it. Last, we pop these two funs into our peg:p() call.

Before we look at wrap_fun, let’s look at semantic_fun, which is a little simpler:

Interestingly, unlike function definitions, 'fun' definitions always use clauses, even if you only have one clause! This one’s really straightforward, just putting the rulename and the line number in the right places.

Now, our final wrapper, wrap_fun.

wrap_fun(Stmts,Line)whenis_tuple(Stmts)->{'fun',Line,{clauses,[{clause,Line,[{var,Line,'I'},{var,Line,'D'}],[],[{call,Line,Stmts,[{var,Line,'I'},{var,Line,'D'}]}]}]}};wrap_fun(Stmts,Line)->io:format("peg rules must be double-arity functions or statements and cannot be sequences of statements!~n"),throw({parse_error,not_tuple,Line,Stmts}).

I engaged in a little paranoia here — I wanted to make sure someone hand-writing a parser wouldn’t try to do something that would break the convention of a single top-level combinator in each rule. Maybe a little overboard.

For the sake of keeping things short and not colliding with variables in the same scope, I used I and D as local variable instead of Input and Index. What is really curious is how the call statement comes out. In regular Erlang, in order to call a fun that is the result of some other operation, you have to wrap it in parentheses, like so:

(some_fun())(Var)

However, in AST, it looks like any old call, except that instead of being an atom, variable, or remote, it is the substatement.

Back to the top

Now that we’ve dug into the depths of the transform, let’s look again at the top-level.