Lazy Lists and Multi-Threading

Contents

Lists are Lazy

A core concept in Perl 6 is that lists may be “lazy”. This may be behind-the-scenes and of purely academic interest in many cases, but this concept can be harnessed directly as well. In particular, it serves as a natural model of implicit multi-threaded programming, which will becomes more important as computer technology moves to “cores as the new MHz”.

But first, let’s look at the most basic form of laziness, which can be observed using the map function†.

If the list was not lazy, as in Perl 5, you would expect this to produce

[0][1][2][3][4][5][6][7][8][9]13
57
3

That is, the side-effects of calling print in each iteration happen before the call to map returns, in the obvious order. Subsequent uses of the @result list simply access the already-computed data.

But in Perl 6, lists are lazy, and the map function can show this. The call to map does not call fn1 ten times, but rather sets up a lazy list that knows how to compute each element when (and only if) it is needed.

So, the actual output is:

[4]13
[7]57
[1]3

The function fn1 is only called three times, not ten. The injective flag tells the map function that every input value produces exactly one output value, so it can be very lazy indeed†. Without this flag, map doesn’t know if each input will produce zero, one, or some other number of results, so it can’t just know that executing the body on 5th input value will produce the 5th output value. So instead you would get:

[0][1][2][3][4]13
[5][6][7]57
3

If you really needed to make map execute all the iterations immediately and in order, like in Perl 5, you can use the eager function.

my @result = eager map(&f1, ^10);

The eager function doesn’t just apply to the map function. It works on any lazy list, and forces it to finish everything. So you could write:

my @result = map(&f1, ^10);
eager @result; # or @result.eager;

with the same effect. Once you’ve created a lazy list, by any means, you can use functions such as eager to process that list.

The eager function will, in general, check every element in order, and realize it if it is still deferred. “one at a time” and “in order” and “wait for it to finish” are the characteristics of eager, but you can modify it somewhat using options to the function and settings of the lazy list itself.

For example, you could say

@result.eager(:range(5..7));

and it will realize just elements @result[5], @result[6], and @result[7]. You can give it something more complex, such as

@result.eager(:range(8,5..7,1));

and it will realize the elements specified in the range list, in the order in which they are in the list.

The eager function waits for the work to complete before returning. But you can also get out with an exception. The ‘use fatal’ setting is respected during the call.

try {
use fatal;
@results.eager;
}

This will behave naturally just like if you had coded the eager by touching every element using a loop.

try {
use fatal;
for ^@results.elems {
my $temp= @results[$_];
}
}

So, an exception thrown during the lazy evaluation of one of the @results[$_] will get you out of the loop and leave the try block. When multiple threads are involved, things are more complicated, as we’ll see later.

Background Processing

The default settings for a lazy list is to realize each element only when (if ever) it is needed. At that point, the caller needs to wait for it. However, since the lazy list is essentially a “work list”, you can tell it to execute in the background.

@results.lazy(:background);

The call to lazy only changes the setting, and returns immediately. But now the implementation can, if it feels like it, work on realizing parts of @results any time it’s idle. So, if the program has to pause for user interaction or wait for disk I/O, the implementation can work on populating @results. Then, when @results[9] is needed, it is already there, and the program does not have to compute it at that point.

Once you are doing things in the background, there are more options. Can you do more than one at a time? Can they overlap, working on multiple cores at the same time?

But first, what about errors? In the synchronous cases, either by forcing an element to be realized when it was needed or by using eager, exceptions worked sensibly and could propagate out of the work function. But in background execution, that will not do. Whenever elements are being realized in the background, the work function is called under ‘no fatal’, and any Failure return value is simply recorded as the realized value for that element.

This works remarkably well to do what you want, even if you prefer to work with exceptions rather than return codes in your program.

Now obviously our example fn1 will always work, but suppose we have instead a more complicated work function that can fail sometimes. Suppose that it did indeed fail when called to realize @results[7]. And it did so in the background.

When we get to the later part of the program, @results[7] holds not a number but a poisonous Failure object, which is the same object that would have been thrown had the work function been called under ‘use fatal’ and had someplace to throw to.

The nature of “poisonous” Failure objects while under ‘use fatal’ is that any method call, other than checking to see whether it is defined, will actually throw that very Failure as an exception. So, the expression 5 + @results[7] will cause the exception to be thrown, even though it was generated earlier, in the background. Now it has someone to complain to! The Failure springs back to life as an exception as soon as that background result is used somewhere.

Platform-Specialized Options

Now let’s look at the myriad of options available for background processing. The details will depend on the Abstract Machine’s implementation, the hardware platform, and the operating system. And, this is certain to change with time.

So the “specialized” options are by their nature non-standardized. But, in general, some concrete classes derived from System::Background will be provided with your implementation. You can create one, and within it specify all the details that your platform is programmed with, such as core affinity, NUMA node behavior, priority, and whatever else. Setting up these objects should be done in a separate area from the main program logic. These details will not only vary by platform, but can vary based on user’s settings and preferences as well. So that set-up is well kept isolated from the bulk of the (hopefully portable) program.

Whatever concrete class and arguments you used, you will end up creating some objects of a type derived from Background. Each such object specifies details of background threading behavior. And knowing what those are, in global variables for example, you can pass one as an argument to List.lazy. For example, if I set up $server_threads somewhere, I could then say

@results.lazy($server_threads);

Generic Options

Platform-dependant options are all-powerful but difficult to work with. Most of the time you don’t need something that specific or specialized, and can work with generic options.

Now when writing a portable program, you don’t know the capabilities of the threading model of the platform, or the hardware details. So what can you usefully set up? What you do know is the code that went into the work function, and what that code is supposed to do and how it is written.

The generic options allow you to specify the constraints that the system is held to, and the guarantees that it makes, when executing the work function.

For example, by using the :background flag, you are relaxing the constraint that the work function must only be executed synchronously when a result is needed, but you maintain the constraint that only one iteration can be run at a time.

Here is a hierarchy of useful constraints:

Only run synchronously

That is, no background at all.

Run one iteration at a time.

If run in the background, it will wait for one to finish before doing another. Likewise, if you ask for one synchronously, it may need to wait for the current background task to finish first.

Initiate more than one at a time, allow their execution to be interleaved, but still only actually work on one at a time.

This is a simple threading model that works when calling functions (namely I/O) that can block, and are set up to work with this simple protocol.

For example, consider a work function that reads from files.

When told to work on that in the background with interleaving allowed, the implementation starts realizing @results[0]. But, fn2 blocks on reading the file. The background scheduler sees that, and starts @results[1]. In our scenario, that blocks on the file read too, and likewise the scheduler starts working on @results[2] and @results[3]. When that last one blocks (as they all do), the scheduler sees that the first one is ready to proceed, so it resumes work on @results[0], which runs to completion. The scheduler will continue by resuming another one that is ready to proceed, or starting another one. It only works actively on one at a time, but can juggle several in progress.

That is, the threads are not pre-empted, and only yield when calling a function programmed to do so. If it only switches tasks when waiting for I/O or other known points, you don’t have to worry about all the other logic being “atomic” or being interleaved with another instance of the same work list.

This can be further customized in simple ways. You can specify a number other than 1 to have n non-blocked working threads, which is a useful model for server activities. If you have several related Lists (their work functions use the same global variables, for example) you can group them to use a single worker thread rather than one per List object. (The default is one group per package.)

If a normal yield point is not a good stopping point, you can suppress that by using a critical section, just like for normal threading.

Run many threads at the same time.

The least constraint is to run iterations at the same time on as many threads as the implementation wants. The code will have to be written using explicit threading features such as critical sections and thread-safe variables, if the work function would do anything that trips on itself or the main code.

Some functions can be run fully threaded without any threading issues at all. For example, pure math calculations are clearly side-effect free. But anything that does not write to non-local variables or have side-effects will do:

This has no trouble running in the background on all cores, and does not need any kind of mutual exclusion mechanisms, as each call to get_info works with different data and they don’t get in each other’s way.

To recap, the generic options are

synchronous only

one at a time

interleaved

free threading

To make these easy to specify, these correspond to options (named parameters) of the lazy method. If a Background object is not already installed for that lazy List, using one of these options will choose among some built-in ones supplied with the implementation. If a user-specified Background object is present, the flag is passed through to a clone of that object, which can adjust its behavior accordingly.

Orthodoxy

This essay opens with “But first, let’s look at the most basic form of laziness, which can be observed using the map function.” The Synopses don’t explain that map produces a lazy list. In fact, I was asking around for how to create a lazy list when the other members of the Perl 6 mailing list pointed out that the normal map function ought to do it. The map function is actually mentioned a few times in the synopses, but never explained.

The injective flag to map is my own idea. This and other specific function names and parameter names are not at the level of a formal proposal, but do illustrate the design requirements.

The eager keyword, however, is indeed already in the synopses. But it is mentioned so briefly that it leads to more questions about what it is and how to use it. I’ve assumed that it is syntactically like a function call on a list (the synopses doesn't even explain that!) and has the essential concepts of “one at a time” and “in order” and “wait for it to finish”, and explored the ramifications of that here.

Is the syntax for using flags (named arguments) to listops correct in this example?

my @results := map { get_info($_) } @filenames :injective;

If statements like this preserve laziness:

my @result = map(&f1, ^10, :injective);

It means that Array objects, not just List objects, have this feature, and assignment of the contents of a lazy List to an Array preserve the lazy nature.