Headquartered in Seattle, Washington, Cray employs over 1,000 people worldwide with additional research and manufacturing facilities in Minnesota, Texas and Wisconsin and many sales and service locations around the world.

As described in my previous blog article, Chapel is an emerging parallel programming language that we’re developing at Cray to dramatically simplify parallel programming, from your multicore laptop to the world’s most powerful Cray systems. Chapel is being developed in an open-source manner and is designed and implemented with portability as a key concern.

While the previous article gave a high-level overview of Chapel’s concepts, implementation, and next steps, in this article I’ll provide a deeper dive into one specific Chapel feature area—its support for user-defined parallel iterators. This feature is designed to give programmers full control over the parallelism and scheduling used to implement parallel loops, yet in a very productive, high-level manner.

Introduction to Chapel Iterators

Chapel’s iterators were inspired by the iterator concept pioneered in the language CLU. A number of recent languages in mainstream computing have provided a similar capability, including Python, Ruby, JavaScript, and C#. This style of iterator is sometimes called a generator to distinguish it from less productive styles of expressing iterators in languages like Java and C++.

If you’re not familiar with this style of iterator, it’s a lot like a normal procedure; however, rather than returning a single time, it can yield multiple values as it executes. Iterators are typically used to drive loops, where each yielded value corresponds to a single loop iteration. For example, the following loop invokes an iterator named myIter() that yields index values back to the index variable i.

for i in myIter() …

Iterators provide productivity benefits for loops that are similar to those provided by procedures in straight-line code:

Well-designed and named iterators can make code easier to read, understand, and maintain;

Iterator methods can be attached to a collection object to define how to enumerate its members;

A change to one iterator’s definition can affect a large number of loops rather than requiring each one to be rewritten individually;

A specific loop can alter its behavior either by changing the arguments to its iterator or by invoking a different iterator.

Although most of us have gotten by without iterators for most of our programming careers, in my experience, they’re the sort of feature where, once you start using them, you never want to go back.

To illustrate the iterator concept, consider a simple example. The following Chapel iterator computes and yields the first n values in the Fibonacci sequence, formed by repeatedly adding the two previous values together:

// generate n values from the Fibonacci sequence: 0, 1, 1, 2, 3, 5, 8, 13, 21, …
iter fib(n) { // declare an iterator named fib() taking n as its argument
var current = 0, // declare variables to store the current…
next = 1; // …and next number in the sequence
for i in 1..n { // loop sequentially from 1 through n
yield current; // yield the current value back to the callsite
current += next; // add the next value to the current one
current <=> next; // swap the two values to prepare for the next iteration
} // after the n iterations, return
}

The following loop invokes this iterator sequentially:

for f in fib(8) do
writeln(f);

Executing this loop prints out the first eight values of the Fibonacci sequence:

0
1
1
2
3
5
8
13

While this example is fairly simple, more interesting iterators can be written to perform tiled iterations over arrays, to yield the leaf vertices in a graph, to extract data records from an input file, and so forth.

Iterators and Parallelism

Chapel’s iterators differ from most other languages in that they also support parallel iteration. Parallel iterators can be invoked in Chapel using forall loops. For example, the following is a parallel loop over the values 1…n:

forall i in 1..n do
foo(i);

When iterating over an integer range in parallel like this, by default Chapel will use a number of tasks based on the amount of available hardware parallelism—for example, the number of processor cores, p. It then chunks the iteration space evenly between the tasks, giving n/p iterations to each.

This is a reasonable default policy for executing parallel loops, but what if a user wants something different? For example, if the calls to foo() in the loop above take vastly different amounts of time, some tasks could end up with far more work than others, resulting in a load imbalance for this default policy.

In situations like this, users can explicitly call a parallel iterator to get a different parallelization of the loop. For example, Chapel’s standard libraries support a dynamic() iterator that can help in imbalanced cases like this. Here’s the previous loop rewritten to use it:

forall i in dynamic(1..n, chunksize=8) do
foo(i);

This loop will deal out chunks of eight iterations to the tasks, similar to OpenMP’s ‘dynamic’ schedule. As each task finishes its chunk, it grabs the next chunk of eight iterations until none remain. This reduces the likelihood that one task will be given all of the slow iterations and hold up the others.

User-Defined Parallel Iterators

But what if a programmer wants to invoke some parallelization strategy that isn’t included in Chapel’s standard libraries? For example, what if they’ve invented a new parallel loop scheduling heuristic that hasn’t been considered before? Or perhaps there’s domain-specific information in their application that could be leveraged in the parallelization strategy?

In such cases, Chapel programmers can write their own parallel iterators, creating the tasks to implement the loop and specifying how the iteration space should be partitioned among them. While there isn’t enough space in this article to explain how to define parallel iterators, the key characteristic is that they are written within Chapel itself, following the multiresolution design philosophy described in the previous article.

The strengths of Chapel’s iterators were demonstrated by a visiting colleague of ours, Dr. Angeles Navarro from the University of Málaga. She was interested in studying work-stealing algorithms in Chapel, yet we had not implemented such policies ourselves. Rather than coding these algorithms directly in her programs, Angeles was able to encapsulate them within Chapel’s parallel iterators, permitting them to be used cleanly within forall loops. For example, she was the author of the dynamic() iterator shown in the previous section; she also wrote a work-stealing iterator which doesn’t have a counterpart in OpenMP. Her iterators were adopted into Chapel’s standard libraries for other programmers to use, learn from, and improve upon.

In addition to their software engineering benefits, Angeles’s iterators also resulted in performance that rivaled, and occasionally beat, OpenMP’s. For example, the following graph shows the speedup of her dynamic() iterator (in blue) as compared with OpenMP’s dynamic schedule (in orange) and Chapel’s default policy (in green) for a variety of loop workloads using 16 or 32 tasks per compute node. As can be seen, the Chapel and OpenMP dynamic iterators perform comparably. This is notable given that Angeles’s iterators were written in Chapel by an end-user whereas OpenMP’s are baked into the language and compiler.

Summary and Next Steps

To summarize, Chapel’s parallel iterators provide a powerful framework for expert parallel programmers like Dr. Navarro to express control over parallel scheduling policies. At the same time, they result in a productive software routine that is trivial for beginning programmers to use.

To drive this point home, consider what it would take to introduce a new parallel loop schedule like Angeles’s work stealing iterator into a traditional programming model like OpenMP. First, you would need to define the semantics of the feature and convince the governing body to adopt it into the language standard. Then you would need to wait for the feature to be implemented by the various compilers that support the programming model. And only then could you start to rely upon it as a programmer.

In contrast, since Chapel permits end-users to define new parallel iterators within the language, new parallel loop schedules can be created in an ad hoc manner, and reused across diverse loops, programs, and user communities.

Looking forward, while the current iterator framework in Chapel is sufficient for defining parallel iterators, it is not always ideal. As we have gained experience with it, we have identified cases where refinements to the framework would support additional optimizations and expressiveness. Improvements like these are part of the work that we are undertaking over the next few years as we work to improve Chapel’s performance and generality.