APW2014 and the Rakudo Great List Refactor

This past weekend I attended the 2014 Austrian Perl Workshop and Hackathon in Salzburg, which turned out to be an excellent way for me to catch up on recent changes to Perl 6 and Rakudo. I also wanted to participate directly in discussions about the Great List Refactor, which has been a longstanding topic in Rakudo development.

What exactly is the “Great List Refactor” (GLR)? For several years Rakudo developers and users have identified a number of problems with the existing implementation of list types — most notably performance. But we’ve also observed the need for user-facing changes in the design, especially in generating and flattening lists. So the term GLR now encompasses all of the list-related changes that seem to want to be made.

It’s a significant (“great”) refactor because our past experience has shown that small changes in the list implementation often have far-reaching effects. Almost any bit of rework of list fundamentals requires a fairly significant refactor throughout much of the codebase. This is because lists are so fundamental to how Perl 6 works internally, just like the object model. So, as the number of things that are desirable to fix or change has grown, so has the estimated size of the GLR effort, and the need to try to achieve it “all at once” rather than piecemeal.

The pressure to make progress on the GLR has been steadily increasing, and APW2014 was significant in that a lot of the key people needed for that would be in the same location. Everyone I’ve talked to agrees that APW2014 was a smashing success, and I believe that we’ve now resolved most of the remaining GLR design issues. The rest of this post will describe that.

This is an appropriate moment to recognize and thank the people behind the APW effort. The organizers did a great job. The Techno-Z and ncm.at venues were fantastic locations for our meetings and discussions, and I especially thank ncm.at, Techno-Z, yesterdigital, and vienna.pm for their generous support in providing venues and food at the event.

So, here’s my summary of GLR issues where we were able to reach significant progress and consensus.

You are now leaving flatland

(Be sure to visit our gift shop!)

Much of the GLR discussion at APW2014 concerned flattening list context in Perl 6. Over the past few months and years Perl 6 has slowly but steadily reduced the number of functions and operators that flatten by default. In fact, a very recent (and profound) change occurred within the last couple of months, when the .[] subscript operator for Parcels switched from flattening to non-flattening. To illustrate the difference, the expression

(10,(11,12,13),(14,15)).[2]

previously would flatten out the elements to return 12, but now no longer flattens and produces (14,15). As a related consequence, .elems no longer flattens either, changing from 6 to 3.

Unfortunately, this change created a inconsistency between Parcels and Lists, because .[] and .elems on Lists continued to flatten. Since programmers often don’t know (or care) when they’re working with a Parcel or a List, the inconsistency was becoming a significant pain point. Other inconsistencies were increasing as well: some methods like .sort, .pick, and .roll have become non-flattening, while other methods like .map, .grep, and .max continue to flatten. There’s been no really good guideline to know or decide which should do which.

Flattening behavior is great when you want it, which is a lot of the time. After all, that’s what Perl 5 does, and it’s a pretty popular language. But once a list is flattened it’s hard to get the original structure if you wanted that — flattening discards information.

So, after many animated discussions, review of lots of code snippets, and seeking some level of consistency, the consensus on Perl 6 flattening behavior seems to be:

List assignment and the [ ] array constructor are unchanged; they continue to flatten their input elements. (Arrays are naturally flat.)

The for statement is unchanged. for @a,@b { ... } flattens @a,@b and applies the block to each element of @a followed by each element of @b. Note that flattening can easily be suppressed by itemization, thus for @a, $@b { ... } flattens @a but does all of @b in a single iteration.

Method calls tend to not flatten their invocant. This most impacts .map, .grep, and .first… the programmer will have to use .flat.grep and .flat.first to flatten the list invocant. Notably, .map will no longer flatten its invocant — a significant change — but we’re introducing .for as a shortcut for .flat.map to preserve a direct isomorphism with the for statement.There’s ongoing conjecture of creating an operator or syntax for flattening, likely a postfix of some sort, so that something like .|grep would be a convenient alternative to .flat.grep, but it doesn’t appear that decision needs to be made as part of the GLR itself.

Argument lists continue to depend on the context in which they are bound: flattening for slurpy parameters, top-level itemizing for slice parameters, and non-flattening (or deferred flattening) for Positionals.

The above two points produce a general guideline that method call invocants are generally not-flattened, while function call arguments are more likely to be.

We think it will be very difficult to have a guideline that applies 100% of the time — there will be a few exceptions to the rule but they should generally feel natural.

The flattening behavior of operators continues to be specific to each operator — some will flatten, others will not. Fortunately, any flattening behavior should be grouped by precdence level, is generally dwimmy, and there are easy ways to use contextualizers to quickly switch to the behavior you want.

United Parcel Severance

As a result of improvements in flattening consistency and behavior, it appears that we can eliminate the Parcel type altogether. There was almost unanimous agreement and enthusiasm at this notion, as having both the Parcel and List types is quite confusing.

Parcel was originally conceived for Perl 6 as a “hidden type” that programmers would rarely encounter, but it didn’t work out that way in practice. It’s nice that we may be able to hide it again — by eliminating it altogether. 🙂

Thus infix:<,> will now create Lists directly. It’s likely that comma-Lists will be immutable, at least in the initial implementation. Later we may relax that restriction, although immutability also provides some optimization benefits, and Jonathan points out that may help to implement fixed-size Arrays.

Speaking of optimization, eliminating Parcel may be a big boost to performance, since Rakudo currently does a fair bit of converting Parcels to Lists and vice-versa, much of which goes away if everything is a List.

A few more times around the (loop) blocks

During a dinner discussion Jonathan reminded me that Synopsis 4 has all of the looping constructs as list generators, but Rakudo really only implements for at the moment. He also pointed out that if the loop generators are implemented, many functions that currently use gather/take could potentially use a loop instead, and this could be much more performant. After thinking on it a bit, I think Jonathan is on to something. For example, the code for IO::Handle.lines() currently does something like:

gather {
until not $!PIO.eof {
$!ins = $!ins + 1;
take self.get;
}
}

With a lazy while generator, it could be written as

(while not $!PIO.eof { $!ins++; self.get });

This is lazily processed, but doesn’t involve any of the exception or continuation handling that gather/take requires. And since while might choose to not be strictly lazy, but lines() definitely should be, we may also use the lazy statement prefix:

lazy while not $!PIO.eof { $!ins++; self.get };

The lazy prefix tells the list returned from the while that it’s to generate as lazily as it possibly can, only returning the minimum number of elements needed to satisfy each request.

So as part of the GLR, we’ll implement the lazy list forms of all of the looping constructs (for, while, until, repeat, loop). In the process I also plan to unify them under a single LoopIter type, which can avoid repetition and be heavily optimized.

This new loop iterator pattern should also make it possible to improve performance of for statements when performed in sink context. Currently for statements always generate calls to .map, passing the body of the loop as a closure. But in sink context the block of a for statement could potentially be inlined. This is the way blocks in most other loops are currently generated. Inlining the block of the body could greatly increase performance of for loops in sink context (which are quite common).

Many people are aware of the problem that constructs such as for and map aren’t “consuming” their input during processing. In other words, if you’re doing .map on a temporary list containing a million elements, the entire list stays around until all have been processed, which could eat up a lot of memory.

Naive solutions to this problem just don’t work — they carry lots of nasty side effects related to binding that led us to design immutable Iterators. We reviewed a few of them at the hackathon, and came back to the immutable Iterator we have now as the correct one. Part of the problem is that the current implementation is a little “leaky”, so that references to temporary objects hang around longer than we’d like and these keep the “processed” elements alive. The new implementation will plug some of the leaks, and then some judicious management of temporaries ought to take care of the rest.

I’ve got a sinking feeling…

In the past year much work has been done to improve sink context to Rakudo, but I’ve never felt the implementation we have now is what we really want. For one, the current approach bloats the codegen by adding a call to .sink after every sink-context statement (i.e., most of them). Also, this only handles sink for the object returned by a Routine — the Routine itself has no way of knowing it’s being called in sink context such that it could optimize what it produces (and not bother to calculate or return a result).

We’d really like each Routine to know when it’s being called in sink context. Perl 5 folks will instantly say “Hey, that’s wantarray!”, which we long ago determined isn’t generally feasible in Perl 6.

However, although a generalized wantarray is still out of reach, we can provide it for the limited case of detecting sink contexts that we’re generating now, since those are all statically determined. This means a Routine can check if it’s been called in sink context, and use that to select a different codepath or result. Jonathan speculates that the mechanism will be a flag in the callsite, and I further speculate the Routine will have a macro-like keyword to check that flag.

Even with detecting context, we still want any objects returned by a Routine to have .sink invoked on them. Instead of generating code for this after each sink-level statement, we can do it as part of the general return handler for Routines; a Routine in sink context invokes .sink on the object it would’ve otherwise returned to the caller. This directly leads to other potential optimizations: we can avoid .sink on some objects altogether by checking their type, and the return handler probably doesn’t need to do any decontainerizing on the return value.

As happy as I am to have discovered this way to pass sink context down into Routines, please don’t take this as opening an easy path to lots of other wantarray-like capabilities in Perl 6. There may be others, and we can look for them, but I believe sink context’s static nature (as well as the fact that a false negative generally isn’t harmful) makes it quite a special case.

The value of consistency

One area that has always been ambiguous in the Synopses is determining when various contextualizing methods must return a copy or are allowed to return self. For example, if I invoke .values on a List object, can I just return self, or must I return a clone that can be modified without affecting the original? What about .list and .flat on an already-flattened list?

The ultra-safe answer here is probably to always return a copy… but that can leave us with a lot of (intermediate) copies being made and lying around. Always returning self leads to unwanted action-at-a-distance bugs.

After discussion with Larry and Jonathan, I’ve decided that true contextualizers like .list and .flat are allowed to return self, but other method are generally obligated to return an independent object. This seems to work well for all of the methods I’ve considered thus far, and may be a general pattern that extends to contextualizers outside of the GLR.

Now it’s just a SMOPAD

(small matter of programming and documentation)

The synopses — especially Synopsis 7 — have always been problematic in describing how lists work in Perl 6. The details given for lists have often been conjectural ideas that quickly prove to epic fail in practice. The last major list implementation was done in Summer 2010, and Synopsis 7 was supposed to be updated to reflect this design. However, the ongoing inconsistencies (that have led to the GLR) really precluded any meaningful update to the synopses.

With the progress recently made at APW2014, I’m really comfortable about where the Great List Refactor is leading us. It won’t be a trivial effort; there will be significant rewrite and refactor of the current Rakudo codebase, most of which will have to be done in a branch. And of course we’ll have to do a lot of testing, not only of the Perl 6 test suite but also the impact on the module ecosystem. But now that much of the hard decisions have been made, we have a roadmap that I hope will enable most of the GLR to be complete and documented in the synopses by Thanksgiving 2014.

25 Responses to APW2014 and the Rakudo Great List Refactor

Great catch! I’ve updated the article, but used “not” instead of “!” to make it a bit more visually appealing. (I could’ve also switched back to “until”, but I think some will more readily recognize “while” as the looping construct.)

If I understand correctly, placing the loop in parentheses (or any opening bracket) will also allow return values from loops. And contextualizers like ‘lazy’ and ‘eager’ also imply getting a return value from the loop.

Surely not? thing.map(f) and map(f,thing) HAVE to do the same thing, don’t they? If I can’t rely on a core function like this to behave consistently, what hope once I need to use libraries? What’s wrong with a new function “flatmap” or just write “flat.map” when I need it?

How so? I don’t know where you’re coming from, but it shouldn’t seem broken to someone with a Perl background.

The map function (both in Perl 5 and Perl 6), treats its first argument as the code to call for each iteration, and the rest of the argument list as the list to iterate over.

The ‘$‘ sigil in $list explicitly signifies, loosely speaking, that the object referenced by that variable should be treated as a single item when it appears in list context (such as in the aforementioned argument list).

You can always refer to the actual items of the list object using one of these ways:

creates an immutable list with three items:
a) an Int
b) a read-only Scalar containing a List
c) a List
…correct?

And flattening simply means the operation of creating a duplicate of the list object, where child items which are of type List (but not in Scalar containers) have been recursively replaced with their contents?

Alternatively, instead of seven bullet points about how flattening works, you could simply make it so that when a user wants a flattened list, they call a flatten function on it…

This is one of those things that, when I think about showing Perl (both 5 and 6) to people who know another dynamic language, I know they are going to recoil in horror about. It makes dealing with nested data structures more complicated: in Perl 5 it requires convoluted syntax, whereas in Perl 6 it requires memorizing a whole lot of other stuff and their various exceptions.

Perhaps I’m just not a true Perl programmer, because I’d prefer to consistently not have information discarded rather than having to memorize all the various ways that Perl 6 makes guesses as to what I mean as it tries to “do what I mean.” Please just give me my data structure transformed the way I asked it to be transformed, using the consistent semantics I’d find in other languages.

I’m sorry to say this, because P6 offers lots of really awesome things: namely a dynamic language with great concurrency support that isn’t written in s-expressions (Clojure) or which requires the actor model all the way down (Erlang/Elixir). This is something sorely needed right now and a huge opportunity for Perl 6. But these weird decisions that require people to carry around all this context of various rules in their heads is only going to alienate non-Perlers — and last I saw, they far outnumber us. I know it’s not a numbers game, but one of the biggest strengths of P5 — the richness of CPAN — came about due to Perl’s popularity. Can we achieve that same richness in P6?

I’m with you on this one. As a Perl 5 programmer who moved to Ruby but who wishes Perl 6 would present a compellig reason to return, I’m dismayed to discover that Perl’s most fundamental data structure is still the subject of so much confusiont. I’m sure there are uses for list flattening but outside Perl the language is often disparaged as having no proper data structures, with the finger pointing squarely at flattening.

A usable, popluar Perl 6 is essential to revitalise the Perl community but if this List Refactor is considered “Great” then I really do dispare. Maybe Perl has just become too attached to list flattening to ever have a chance of moving on.

That .flat method already exists. It’s just not the only way to get flattening behavior.

Nor would it necesssarily make things more explicit to make it the only way. Think about it: If a function chooses to call .flat on an argument it is given, this will feel very “explicit” for the author of the function, but from the point of view of code that calls the function, it will effectively be implicit behavior that needs to be learned.
By letting functions encode the desire to operate on flattened arguments into their signature (by specifying a slurpy parameter), it at least becomes self-documenting and easy to discover.

Really, after coming back to this blog post again, I realize that the list of “flattening rules” given here only seems so complicated because it describes things in comparison to the pre-GLR Perl 6 design.

It’s actually reasonably simple. Only two language features I’m aware of perform implicit flattening of their inputs:

– list assignment
– for loops

And then there are ways to “explicitly” get things to flatten:

– Calling the flat method/sub on an object
– Specifying a function/method/operator parameter as “slurpy”

Some built-in functions/method/operators use such slurpy parameters, and in theory you’ll need to learn which ones do case by case, but in practice it should feel pretty obvious once you get a hang of the language. And you can always look at a routine’s signature to make sure.

@anonymous and @Anon … the list describing the flattening behavior is really only three items long (items 4. and 5. are really an extension to item 3. maybe 3a, 3b and the last two points are really just comments/discussion). Arguably item 1. just describes something that isn’t going to change so, rather than horror and confusion, the points are the start a discussion of what really is more like two broad and easy to remember general principles that are still being worked on.

For me it makes sense to have a @list.operation method call leave what ever list-ish or Array object it is called on as is (though the shorthand .|operation seems like a neat idea) since I’m treating it like an object. At the same time having a function operation( @list ) make sensible decisions about flattening gives me useful way to leverage context in the language.

Since this will still be perl, I expect any behavior will be extremely well-documented and since it is perl6 I expect it will come with deluxe informative error messages and warnings.

How difficult would it be to never implicitly flatten in any circumstance, and always require some explicit marker to cause flattening? That would make things a lot more consistent I think. The marker could even be some syntax other than .flat, it could be connected with assignment or ‘for’ but it is still something explicit and visible.