We’ve now reached the end of this year’s advent series, what will be the gift in our last box? The door opens to reveal…the Perl 6 grammar.

At first it might seem odd to cite a grammar as a significant component of a language. Obviously the syntax matters a lot to people writing programs in that language, but once a syntax has been designed, we simply use grammars to describe the syntax and build parsers, right?

Not in Perl 6, where language syntax is a dynamic thing — modifiable to accommodate new keywords and syntax not anticipated in the original design. Or, perhaps more accurately, Perl 6 explicitly anticipates and supports modules and applications changing the language’s syntax for their specific needs. Defining custom operators is just one example of a place where we change the language syntax itself, but Perl 6 also allows the dynamic addition of macros, new statement types, new sigils, and the like.

Thus a Perl 6 grammar and parser needs to not only parse the standard Perl 6 syntax, it must handle program-defined custom syntax as well. Language modifications must also be scoped, so that defining a new operator in one module doesn’t inadvertently change the interpretation of another module in unintended ways.

This is what the Perl 6 standard grammar achieves, and much of the effort that has gone into the Perl 6 specification for regexes and grammars (Synopsis 5) has been just to make this sort of thing possible. I personally believe this is one of the key features that will enable Perl 6 to remain a viable language far into the future. (On the other hand, when I first read the designs for Perl 6 in detail, I had serious doubts as to whether this could in fact be achieved. It’s nice to see that it wasn’t an impossible dream.)

The expectation is that parsers for Perl 6 will themselves be written in Perl 6, and there are several examples already available. The “standard” or “reference” grammar and parser is STD.pm; Larry has been using this to refine the Perl 6 language specification and explore the impacts of various language constructs on the writing of Perl 6 programs.

Some parts of STD.pm are still evolving in response to implementation concerns; thus Rakudo Perl maintains its own version of the language grammar that works for its environment. Many of the ideas first explored by Rakudo often find their way back into the standard grammar. This is by design — our expectation is that the various grammar implementations will continue to converge over the course of the next year.

The key feature that jumps out from looking at the Perl 6 grammar is the use of protoregexes. A protoregex allows multiple regexes to be combined into a single “category”. In a more traditional grammar, we might write:

We’re still saying that a <statement> matches any of the listed statement constructs, but the protoregex version is much easier to extend. In the non-protoregex version above, adding a new statement construct (such as “repeat..until”) would require rewriting the “rule statement” declaration in its entirety to include the new statement construct. But with a protoregex, we can simply declare an additional rule:

rule statement:sym<repeat> { 'repeat' <stmt> 'until' <expr> }

This newly declared rule is automatically added as one of the candidates to the <statement> protoregex. All of this works for derived languages as well:

Thus MyGrammar parses everything the same as BaseGrammar, with the additional definition of the repeat..until statement construct.

The ability to dynamically replace the existing grammar with a new one that has different parse semantics is at the heart of Perl 6’s operator overloading, macro handling, and other syntax modifying features. Unlike source filters, this provides a much more nuanced approach to declaring new language constructs.

Another significant component of the standard grammar is its devotion to providing useful error diagnostics when an error is encountered. Instead of simply saying “an error occurred here”, it offers suggestions about what might have been intended instead, and places where it thinks the programmer may have been confused. It also does significant work to catch constructs that have changed between Perl 5 and Perl 6, to assist people with migration. For example, if someone writes an “unless” statement with an “else” block, the parser responds with

unless does not take "else" in Perl 6; please rewrite using "if"

Or, if a program appears to contain the question-mark-colon (?:) ternary operator, the parser says

Unsupported use of "?:"; in Perl 6 please use "??!!"

In late October of this year, Rakudo started a significant refactor in a new branch (called “ng”) that makes use of protoregexes and the many other features of the STD.pm grammar. We still have a short way to continue before this new branch can become the official released version of Rakudo, but we expect that to happen in the January 2010 release. Already this conversion has enabled us to finally add long-awaited features in Rakudo, including dynamic generation of metaoperators, lazy list handling, lexical context handling, and the like.

With Rakudo’s conversion to following STD.pm for its grammar, we’re very much on track for the Rakudo Star release in April 2010. While we expect that Rakudo Star won’t be a complete implementation of Perl 6, it will be sufficiently advanced and usable for a wide variety of applications. We’ve been quickly resolving the critical items listed in the Rakudo Star ROADMAP, and over the next couple of months will be focusing on improved error reporting (like STD.pm) and distribution / packaging issues.

…and this concludes the Perl 6 Advent series for December 2009. We hope that you’ve enjoyed reading the articles at least as much as we’ve enjoyed writing them, and we appreciate the many comments that people have made about the posts. We also hope to have conveyed our sense that many useful parts of Perl 6 are available now for experimentation, and that we’re well on the way to making them available in 2010 for a wider variety of applications. Indeed, we have high hopes and expectations for the entire Perl family in 2010 — it promises to be an exciting time for us all.

Today’s gift is a construct not often seen in other languages. It’s an iterator builder! And it’s called gather.

But first, let’s try a bit of historical perspective. Many Perl people know their map, grep and sort, convenient functions to carry out simple list transformations, filterings and rankings without having to resort to for loops.

It’s unfortunate that the functional paradigm puts the steps in reverse order of processing. The proposed pipe syntax, notably the ==>, would solve that. But it’s not implemented in Rakudo yet, only described in S03.

Anyway, if you’ve read the post from day 20, you know that the Schwartzian transform is built into sort nowadays:

(The real map can swallow several argument at a time, making it more powerful than the &mymap above.)

Just to be clear about what happens: gather signals that within the subsequent block, we’ll be building a list. Each take adds an element to the list. You could think of it as pushing to an anonymous array if you want:

Which brings us to the first property of gather: it’s the construct you can use for building lists when map, grep and sort aren’t sufficient. Of course, there’s no need to reinvent those constructions… but the fact that you can do that, or roll your own special variants, is kinda nice.

The above is nicer than using map, since we need to manage the $string-accumulator between iterations.

(Implementing &incremental-concat by hand is silly in an implementation which implements the [\~] operator. Just a short announcement to people who want to keep track of the extent to which Perl 6 is channeling APL. Rakudo doesn’t yet, though.)

The second property of gather is that while the take calls (of course) have to occur within the scope of a gather block, they do not necessarily have to occur in the lexical scope, only the dynamic scope. For those unfamiliar with the distinction, I think an example explains it best:

See what’s happening here? We wrap the call to &traverse-tree-inorder in a gather statement. The statement itself doesn’t lexically contain any take calls, but the called subroutine does, and the take in there remembers that it’s in a gather context. That’s what makes the gather context dynamic rather than lexical.

Just to hammer in the point, traverse-tree-inorder does lexical recursion on a tree structure, and no matter how far down the call stack we find ourselves, the values passed to take find their way back into the same anonymous array, rooted in the gather around the original call. It’s as if the anonymous array were implicitly passed around for us automatically as invisible arguments. Another way to view it is that the gather mode works orthogonally to the call stack, essentially not caring about how many calls down it is.

Should we be unfortunate enough to do a take outside of any gather block, we’ll get a warning at runtime.

I’ve saved the best till last: the third property of gather: it’s lazy.

What does “lazy” mean here? Well, to take the above tree-traversing code as an example: when the assignment to @all-nodes has executed, the tree hasn’t yet been traversed. Only when you access the first element of the array, @all-nodes[0], does the traversal start. And it stops right after it finds the leftmost leaf node. Access @all-nodes[1], and the traversal will resume from where it left off, run just enough to find the second node in the traversal, and then stop.

In short, the code within the gather block starts and stops in such a way that it never does more work than you’ve asked it to do. That’s lazy.

It’s essentially a model of delayed execution. Perl 6 promises to run the code inside your gather block, but only if it turns out that you actually need the information. Operationally, you can think of it as a separate thread that starts and stops, always doing the smallest possible amount of work to keep the main thread satisfied. But under the hood in a given implementation, it’s likely implemented by continuations — or, failing that, by painful, complicated cheating.

Now here’s the thing: most any array in Perl 6 has the lazy behavior by default, and things like reading all lines from a file are lazy by default, and not only canmap and grep be implemented using gather; it turns out that they actually are, too. So map and grep are also lazy.

Now, it’s nice to know that values aren’t unnecessarily generated when you’re doing calculations with arrays… but the really nice thing is that lazy arrays open up the door for stream-based programming and, by extension, infinite arrays.

Unfortunately, laziness hasn’t landed in Rakudo yet. We’re nearly there though, so I don’t feel too bad dangling these examples in front of you, even though they will currently cause Rakudo to spin the fans of your computer and nothing more:

(That last subroutine is a Perl 6 solution to the Hamming problem, described in section 6.4 of mjd++’s Higher Order Perl. A seriously cool book, by the way. It builds iterators from scratch; we just use gather for the same result.)

Today’s gift is something I complain about a lot. Not because it’s a problem in Perl 6, but because Java doesn’t have it: operator overloading, and the definition of new operators.

Perl 6 makes it easy to overload existing operators, and to define new ones. Operators are simply specially-named multi subs, and the standard multi-dispatch rules are used to determine what the most appropriate implementation to call is.

A common example, which many of you may have seen before, is the definition of a factorial operator which mimics mathematical notation:

multi sub postfix:<!>(Int $n) {
[*] 1..$n;
}
say 3!;

The naming convention for operators is quite straightforward. The first part of the name is the syntactic category, which is prefix, postfix, infix, circumfix or postcircumfix. After the colon is an angle bracket quote structure (the same kind of thing often seen constructing lists or accessing hash keys in Perl 6) which provides the actual operator. In the case of the circumfix operator categories, this should be a pair of bracketing characters, but all other operators take a single symbol which can have multiple characters in it. In the example above, we define a postfix operator ! which functions on an integer argument.

You can exercise further control over the operator’s parsing by adding traits to the definition, such as tighter, equiv and looser, which let you specify the operator’s precedence in relationship to operators which have already been defined. Unfortunately, at the time of writing this is not supported in Rakudo so we will not consider it further today.

If you define an operator which already exists, the new definition simply gets added to the set of multi subs already defined for that operator. For example, we can define a custom class, and then specify that they can be added together using a custom infix:<+>:

In which case we’re really just redispatching to one of the built-in variants of infix:<==>. At the time of writing this override of == doesn’t work properly in Rakudo.

One thing you might want to do which you probably shouldn’t do with operator overloading is operating things like prefix:<~>, the stringification operator. Why not? Well, if you do that, you won’t catch every conversion to Str. Instead, you should give your class a custom Str method, which is what would usually do the work:

This will be called by the default definition of prefix:<~>. Methods which have the names of types are used as type conversions throughout Perl 6, and you may commonly wish to provide Str and Num for your custom types where it makes sense to do so.

Thus overriding prefix:<~> makes little sense, unless you actually want to change its meaning for your type. This is not to be recommended, as programmers in C++ and other languages with operator overloading will be aware. Changing the conventional semantics of an operator for a custom type is not usually something which ends well, leads to confusion in the users of your library and may result in some unpleasant bugs. After all, who knows what operator behaviour the standard container types are expecting? Trample on that, and you could be in a great deal of trouble.

New semantics are best left for new operators, and fortunately, as we have seen, Perl 6 allows you to do just that. Because Perl 6 source code is in Unicode, there are a great variety of characters available for use as operators. Most of them are impossible to type, so it is expected that multicharacter ASCII operators will be the most common new operators. For an example of a Unicode snowman operator, refer back to the end of Day 17.

In today’s post, we’re going to cover several topics, but primarily grammars. I’d like to use an example inspired by some Perl 5 code I wrote for work recently.

So we have a bunch of text that we want to process and deal with. Perl’s supposed to be great at that, right? To be precise, let’s talk about the following text, describing some questions and their answers:

To parse this in Perl 6, I’ll start by defining a Grammar. A Grammar is a special type of namespace for holding regular expressions. We’ll also define several named expressions to split up our parsing a bit.

First, a brief overview of what’s going on here, from a standpoint that assumes you’re familiar with regular expressions at least a little. By default in Perl 6 grammars, whitespace is ignored and matches occur over the entire string. It’s like the /x and /s Perl 5 modifiers are turned on. TOP is the regex that’s called if we try to match against the entire grammar as a whole.

‘token’ is one of three identifiers used to declare a regex, including ‘regex’, ‘token’, and ‘rule’. ‘regex’ is the plain, unmodified version, and the second two just enable some additional options. ‘token’ disables backtracking, and ‘rule’ both disables backtracking and causes whitespace in the regex to match literal whitespace in the matched text. We won’t use ‘rule’s here.

The <foo> syntax is what’s used to call another named regex. ‘^^’ is used to match the beginning of a line, as opposed to lone ‘^’, which matches the beginning of the entire matched text. The square brackets, [], are non-capturing grouping, like (?: ) in Perl 5 regular expressions.

The = syntax is used to assign the RHS into the name specified on the LHS. You’ll see what I mean later when we use the result of this regex.

Let’s see what we get if we try matching against that grammar and printing the result we get back:

So we can see that a match object contains named items as hash values, and repetitions are stored as an array. If we had any positional captures, made with parens just like Perl 5: (), they would have been accessed through the positional interface, with postfix [], like an array.

The next step is to make some classes and then populate them from the match object. First some class definitions:

class Question::Answer {
has $.text is rw;
has Bool $.correct is rw;
}

class Question {
has $.text is rw;
has $.type is rw;
has Question::Answer @.answers is rw;
}

Building Question objects out of the match object isn’t that bad, but it’s still not pretty:

Remembering that any repetition in the regex is reflected as an array in the match object, we map over the <question> attribute, building a Question object for each. Each <question> match has an array of <answer> matches, so we map over those too, building a list of Question::Answer objects for each one. We are stringifying the values to have strings in our array, rather than a bunch of Match objects.

As you can guess, this approach doesn’t scale up very well. A much nicer way to do it is to build the objects as we go. The method used to do this is to pass an object as the :action argument to the .parse() method on the Grammar. The parsing engine will then call methods on that object with the same name as the regexes being parsed, with the evaluated match object for the rule passed as an argument. If the method calls ‘make()’ during execution, the argument to ‘make()’ is set as the ‘.ast’ (for “Abstract Syntax Tree”) attribute of the match object.

Okay, that’s a fairly abstract description. Let’s see some real code. We need to make a class with methods named the same as those three regexes:

$/ is the traditional name for match objects, and it’s as special as $_, in that there’s special syntax that accesses its attributes. Named and Positional access without a variable ($<foo> and $[1]) are translated into access to $/ ($/<foo> and $/[1]). It’s only a one-character difference, but it saves some visual noise, and helps it fill a semantic space similar to $1, $2, $3, etc. in Perl 5.

In the ‘TOP’ method, we just use a hyperoperator method call to make a list of the .ast attributes of each item in $<question>. Again, whenever we call ‘make’ in an action method, we’re setting something as the ‘.ast’ attribute of the match object that gets returned, so this is just fetching whatever we ‘make’ in the ‘question’ method.

In the ‘question’ method, we construct a new Question object, populating its attributes from the match object, and specifically set its ‘answers’ attribute as the list of objects we produce in each call to the ‘answer’ regex from the current parse of ‘question’.

In the ‘answer’ method, we do the same thing, setting the ‘correct’ attribute to the result of a comparison, so that it satisfies the ‘Bool’ type constraint on the attribute.

So, again, to use this in a parse, we instantiate this new class and pass the object as the :action parameter to the ‘.parse’ method of the grammar, and then we fetch the constructed object from the ‘.ast’ attribute of the match object it returns:

‘comb’ is kind of the opposite of ‘split’, in that we specify what we want to keep instead of what we want to discard. The advantage here is that we don’t have to choose a delimiter. The user can enter “1 2 3”, “1,2,3”, or even “1, 2, and 3”. We then use a hyperoperator method call to generate an array of Integers from the array of Matches, and then sort it.

Next, let’s generate a corresponding array of all of the correct answer indexes, and then compare them to determine correctness of the response. This isn’t the only way to do it, merely the first that occurred to me. :)

There are many simple yet powerful ideas baked-in to Perl 6 that allow many wonderful things. One of these simple ideas is introspection. Introspection is the act of observing yourself. For a programming language this means that there is some mechanism by which you, the programmer, can express questions about the language in the language itself. Perl 6 is a language that supports introspection in several ways. For instance, there are methods on object instances that tell to what class the object belongs, there are methods on classes to tell what methods are available to the class, etc.

There are even methods on subroutines to ask what the subroutine’s name is:

Now that may seem slightly pointless, but keep in mind that subroutines can be assigned to scalars or aliased to other names or may be generated on-the-fly, so it’s name may not be immediately obvious by glancing at the code.

my $bar = &foo;
# ... and then much later ...
say $bar.name; # What was this subroutine's name again?

Here are a few other items you can introspect on subroutines.

say &foo.signature.perl; # What does the subroutine signature look like?
say &foo.count; # How many arguments does this subroutine take?
say &foo.arity; # How many are required?

That last thing we introspected from the subroutine was its arity; the number of required arguments for a subroutine/method. Because Perl can now figure that information out for itself via introspection, interesting new things are available that weren’t easy or were even non-existent before. For instance, in Perl 5, map blocks take a list of items one at a time and transform each into one or more new items to create a new list, but because Perl 6 knows how many arguments are expected, it can take as many as needed.

Another benefit of this ability to introspect the number of parameters a subroutine requires is a nicer mechanism for sorting arrays by some other criteria than the default string comparison. The sort method on arrays takes an optional subroutine to act as the comparator and ordinarily this subroutine takes 2 parameters–the items of the array to be compared. So, if you were modeling people and their karma, and wanted to sort an array of people by karma, you’d write something similar to this:

But! Since Perl 6 can introspect the comparator, we’ve got another option now. By passing a subroutine that only takes 1 parameter, Perl 6 can know to automatically do the equivalent of a Schwartzian Transform. The above sort can now be written like so:

.say for @people.sort: { $^a.karma };

But wait! There’s another small syntactic advantage now that there’s only one parameter to the subroutine: implicit aliasing to $_. So we can eliminate the auto-declared placeholder variable entirely:

.say for @people.sort: { .karma };

What this does is call the .karma method on each element of the array one time (rather than twice for each comparison as would be done with the normal comparator) and then order the array based on the results.

Another little big thing is that Perl 6 has a built-in type system. You may have noticed in the karma example above that I didn’t specify that numeric comparison should be used. That’s because perl is smart enough to figure out that we’re using numbers. If I had wanted to force the issue, I could have used a prefix + or ~:

The Whatever-Star was also an inspiration for the planned Rakudo Star release. Because we know that release won’t be a complete implementation of Perl 6, we didn’t want to call it “1.0”. But other release numbers and tags also presented their own points of confusion. Finally it was decided to call it “Rakudo Whatever”, which then became “Rakudo Star”. (Our detailed plans for Rakudo Star are kept the Rakudo repository.)