Parallelism

Resources

Parallel Collections

One of the simplest, easiest-to-use and
most successful forms of parallelism revolves around
programming with parallel collections. There
are many different kinds of parallel collections, including
collections that behave like sets or like dictionaries or tables
or sequences. In general, these collection data types provide an
interface that includes operations that process all objects in the
collection in parallel. For instance, map f c
is one of the most common operations. It generates a new set
(or dictionary or table or
sequence) by a applying the function f to all objects
in the collection c in parallel.

Bulk parallel operations like map f are simple
to use as the semantics of a parallel map is deterministic
and indistinguishable (except for performance) from
the semantics of a sequential map when the function f is effect-free.
So programming with parallel collections is no harder than programming with
sequential collections, and you learned how to do that latter in the second
week of class. It's easy!

Moreover, it is well-known how to develop highly efficient implementations
of many operations over parallel collections.
Some implementations are designed to
operate over a single multi-core machine and others over many
machines, such as in a data center. In the latter case, the implementation
will typically also handle machine failures transparently
by re-executing jobs that occurred on failed machines. When functions
are effect-free, re-executing them does no harm (another huge benefit of
functional programming).

Because there are many different implementation techniques for
parallel collections and the implementation
details are often important for achieving
high performance, you should strive to implement your algorithms
at a high level of abstraction on top of abstract parallel data types.
This will help separate the low-level parallel implementation details
involving scheduling tasks across multiple cores, processors or machines
from the high-level algorithmic details specific to your problem.
If you use abstraction carefully,
it should be possible, for example, to swap out one parallel implementation
for another one. This can help to make your algorithms more portable
and reuseable. For instance, you might design
and test your code on a single machine but then deploy it in a data
center over hundreds of machines. Having said that, not all parallel
collection APIs can be implemented equally efficiently on different
types of parallel computing platforms. For instance, experience has
shown that the powerful parallel sequences API we will discuss next
can be effectively implemented on a single shared memory multicore machine,
but may be less efficient on a cluster of many machines. On the other hand,
the map-reduce collection library implemented by Google is ideal
for computing over large clusters in a data center.

Parallel Sequences

The abstract sequence data type is a
canonical example of a parallel collection.
Java has such a library and several
parallel programming languages such
as NESL
and data-parallel
Haskell have been
organized around the idea of programming parallel sequences.

In this section, we will explore the use of a parallel sequence library
inspired by NESL
and defined by the following interface. Notice that we specify the
work and span of each operation in the interface. This is need-to-know
information for clients of the sequence library who wish to estimate
the work and span of their algorithms.

To begin, let us examine the tabulate function. It is one of the
primary constructors for sequences. For instance, to construct an
integer sequence with elements 0..(n-1) with O(1) work and O(1) span,
we simply write:

let make (n:int) =
tabulate (fun i -> i) n
;;

Tabulate is especially powerful in conjunction with nth. For instance,
to reverse a sequence with O(n) work and O(1) span:

The sequence data type is fine data type for implementing many
efficient divide-and-conquer parallel algorithms because it supports
an efficient (O(1)) operation to split a sequence in to subsequences.
To explore
divide-and-conquer over parallel sequences, let's take a look
at the parenthesis matching problem. The goal is to take a
sequence of parentheses like this one:

and determine whether the parentheses match. Recall that a sequence
of parens does not match if there is ever a point in the sequence when
there are more closed (ie: right) parens than there are open (ie: left)
parens or if the entire sequence doesn't contain exactly the same
number of open and closed parens. Here are examples of unmatched
sequences:

(
())
())(
)

The first step in attacking this problem is to define a data type
to represent parentheses:

To craft an efficient parallel solution to the parenthesis matching
problem, the following observations will come in handy:

Suppose the sequence s has () as a subsequence and suppose s'
is the sequence we obtain when we remove that subsequence ()
from s. Then s is a matching sequence of
parentheses if and only if s' is a matching sequence of parentheses.

Suppose the sequence s has no subsequences of the form ().
Then s must have the form ...)))(((.... In other words s must
be a sequence of right parens followed by a sequence of left parens.

Using those ideas, we can construct an efficient divide-and-conquer
algorithm for parenthesis matching. The core of the algorithm is
a routine that eliminates all of the matching pairs () of the
input s and returns information about the sequence ...)))(((...
of unmatched pairs that remain. Specifically, if ...)))(((...
consists of i right parens and j left parens, our algorithm will
return (i,j).

This core algorithm will operate by splitting its input
sequence in half, recursively computing the number of unmatched parens (i, j)
for the left half of the sequence and the number of unmatched parens (k, l)
from the right half of the sequence. The observation is that after
returning from the 2 recursive calls, we know that our sequence has
the following form:

))...i...)) ((...j...(( ))...k...)) ((...l...((

And we can see that some of the j left parens and k right parens will
cancel eachother out. In fact:

if j > k then we wind up with i right parens followed by
l + j - k left parens.

if j <= k then we wind up with i + k - j right parens followed by
j2 left parens.

Such a computation allows us to implement the "combine" step of our
divide-and-conquer algorithm efficiently. The code, using the sequence API,
follows. Notice that we define a convenient and reuseable helper function
for sequences --- one that splits a sequence in to a "tree view."

Google Map-Reduce

The idea of using functional programming to parallelize analysis
of massive data sets really came in to vogue after
Jeffrey Dean and Sanjay Ghemawat published their influential paper
on Google's map-reduce programming platform in 2004. In order to get a sense of the
context they were working in and where their ideas came from, it is worthwhile
quoting the first couple of paragraphs of their article:

Over the past five years, the authors and many others at
Google have implemented hundreds of special-purpose
computations that process large amounts of raw data,
such as crawled documents, web request logs, etc., to
compute various kinds of derived data, such as inverted
indices, various representations of the graph structure
of web documents, summaries of the number of pages
crawled per host, the set of most frequent queries in a
given day, etc. Most such computations are conceptually
straightforward. However, the input data is usually
large and the computations have to be distributed across
hundreds or thousands of machines in order to finish in
a reasonable amount of time. The issues of how to parallelize
the computation, distribute the data, and handle
failures conspire to obscure the original simple computation
with large amounts of complex code to deal with
these issues.

As a reaction to this complexity, we designed a new
abstraction that allows us to express the simple computations
we were trying to perform but hides the messy details of
parallelization, fault-tolerance, data distribution
and load balancing in a library. Our abstraction is
inspired by the map and reduce primitives present in Lisp
and many other functional languages...
Our use of a functional model with user-specified map
and reduce operations allows us to parallelize
large computations easily and to use re-execution
as the primary mechanism for fault tolerance.

Towards the end of their article, the authors cite some
statistics concerning the adoption of map-reduce at Google at
the time (late 2004). In just its first year, over 900 separate
map-reduce programs were checked in to their main source code respository.
In August 2004, over 29,000 different map-reduce jobs were run on their
data centers, almost 80,000 machine days used and 3.3 TB of input data
read. Since then, the popularity of the basic map-reduce paradigm
has grown greater still and
there are many implementations of the basic concept. For instance,
Hadoop is an open source
implementation developed by Apache and used by many companies (you can
download it and use it yourself too).
Facebook
reported in 2008 that one of their largest Hadoop clusters
used 2500 cores and had 1 Petabyte of storage attached. That's a lot of
computation and a lot of data!

The key to the success of map-reduce is the simplicity of its programming
model. Indeed,
the map-reduce programming model is little more than
a minor variation the sequential map-reduce
programming paradigm that you learned in the first couple weeks of class.
However, Google has managed to hide a complex, distributed, parallel and
fault-tolerant implementation behind this simple interface.
Because the interface was so simple, many of their analysts --
researchers or programmers not necessarily
skilled in parallel programming or distributed systems -- could use it easily.
Modularity and clear, high-level abstractions are just as important
(perhaps more so) in large-scale distributed computing as in
sequential computing.

For more information on
the map-reduce parallel programming paradigm, see
Dean and Ghemawat's
paper on Google's map-reduce implementation.