Something cool about Perl 6 every day

Primary Menu

Day 21, Grammars and Actions

In today’s post, we’re going to cover several topics, but primarily grammars. I’d like to use an example inspired by some Perl 5 code I wrote for work recently.

So we have a bunch of text that we want to process and deal with. Perl’s supposed to be great at that, right? To be precise, let’s talk about the following text, describing some questions and their answers:

To parse this in Perl 6, I’ll start by defining a Grammar. A Grammar is a special type of namespace for holding regular expressions. We’ll also define several named expressions to split up our parsing a bit.

First, a brief overview of what’s going on here, from a standpoint that assumes you’re familiar with regular expressions at least a little. By default in Perl 6 grammars, whitespace is ignored and matches occur over the entire string. It’s like the /x and /s Perl 5 modifiers are turned on. TOP is the regex that’s called if we try to match against the entire grammar as a whole.

‘token’ is one of three identifiers used to declare a regex, including ‘regex’, ‘token’, and ‘rule’. ‘regex’ is the plain, unmodified version, and the second two just enable some additional options. ‘token’ disables backtracking, and ‘rule’ both disables backtracking and causes whitespace in the regex to match literal whitespace in the matched text. We won’t use ‘rule’s here.

The <foo> syntax is what’s used to call another named regex. ‘^^’ is used to match the beginning of a line, as opposed to lone ‘^’, which matches the beginning of the entire matched text. The square brackets, [], are non-capturing grouping, like (?: ) in Perl 5 regular expressions.

The = syntax is used to assign the RHS into the name specified on the LHS. You’ll see what I mean later when we use the result of this regex.

Let’s see what we get if we try matching against that grammar and printing the result we get back:

So we can see that a match object contains named items as hash values, and repetitions are stored as an array. If we had any positional captures, made with parens just like Perl 5: (), they would have been accessed through the positional interface, with postfix [], like an array.

The next step is to make some classes and then populate them from the match object. First some class definitions:

class Question::Answer {
has $.text is rw;
has Bool $.correct is rw;
}

class Question {
has $.text is rw;
has $.type is rw;
has Question::Answer @.answers is rw;
}

Building Question objects out of the match object isn’t that bad, but it’s still not pretty:

Remembering that any repetition in the regex is reflected as an array in the match object, we map over the <question> attribute, building a Question object for each. Each <question> match has an array of <answer> matches, so we map over those too, building a list of Question::Answer objects for each one. We are stringifying the values to have strings in our array, rather than a bunch of Match objects.

As you can guess, this approach doesn’t scale up very well. A much nicer way to do it is to build the objects as we go. The method used to do this is to pass an object as the :action argument to the .parse() method on the Grammar. The parsing engine will then call methods on that object with the same name as the regexes being parsed, with the evaluated match object for the rule passed as an argument. If the method calls ‘make()’ during execution, the argument to ‘make()’ is set as the ‘.ast’ (for “Abstract Syntax Tree”) attribute of the match object.

Okay, that’s a fairly abstract description. Let’s see some real code. We need to make a class with methods named the same as those three regexes:

$/ is the traditional name for match objects, and it’s as special as $_, in that there’s special syntax that accesses its attributes. Named and Positional access without a variable ($<foo> and $[1]) are translated into access to $/ ($/<foo> and $/[1]). It’s only a one-character difference, but it saves some visual noise, and helps it fill a semantic space similar to $1, $2, $3, etc. in Perl 5.

In the ‘TOP’ method, we just use a hyperoperator method call to make a list of the .ast attributes of each item in $<question>. Again, whenever we call ‘make’ in an action method, we’re setting something as the ‘.ast’ attribute of the match object that gets returned, so this is just fetching whatever we ‘make’ in the ‘question’ method.

In the ‘question’ method, we construct a new Question object, populating its attributes from the match object, and specifically set its ‘answers’ attribute as the list of objects we produce in each call to the ‘answer’ regex from the current parse of ‘question’.

In the ‘answer’ method, we do the same thing, setting the ‘correct’ attribute to the result of a comparison, so that it satisfies the ‘Bool’ type constraint on the attribute.

So, again, to use this in a parse, we instantiate this new class and pass the object as the :action parameter to the ‘.parse’ method of the grammar, and then we fetch the constructed object from the ‘.ast’ attribute of the match object it returns:

‘comb’ is kind of the opposite of ‘split’, in that we specify what we want to keep instead of what we want to discard. The advantage here is that we don’t have to choose a delimiter. The user can enter “1 2 3”, “1,2,3”, or even “1, 2, and 3”. We then use a hyperoperator method call to generate an array of Integers from the array of Matches, and then sort it.

Next, let’s generate a corresponding array of all of the correct answer indexes, and then compare them to determine correctness of the response. This isn’t the only way to do it, merely the first that occurred to me. :)

Like this:

Related

Post navigation

10 thoughts on “Day 21, Grammars and Actions”

Really interesting stuff and I understand that it is difficult to really subdivide it into smaller fragments. Still, this is the first Perl6 Advent Calendar gift I felt that I have to digest under the course of several days.

There is quite a lot of stuff to grasp here (but no wonder, the Regexes part is one of the pieces that got a major overhaul in Perl6 compared to perl5).

I think this is my favorite gift so far. It’s a bit long, but the writing is really good so the length isn’t really a problem. The examples are beautiful and manage to tie together a lot of different parts of the language quite nicely. Bravo!

Hello, Neat post. There is an issue together with your website in
web explorer, may test this? IE nonetheless is the market leader and a big component of other people will omit your magnificent writing due to this problem.