Revision as of 16:11, 14 September 2008

This site attempts to document all our available information on
exploiting such hardware with Haskell.

Throughout, we focus on exploiting shared-memory SMP systems, with aim of lowering absolute wall clock times. The machines we target are typical 2x to 32x desktop multicore machine, on which vanilla GHC will run.

1 Introduction

To get an idea of what we aim to do -- reduce running times by exploiting more cores -- here's a naive "hello, world" of parallel programs: parallel, naive fib. It simply tells us whether or not the SMP runtime is working:

2 Thread primitives

For explicit concurrency and/or parallelism, Haskell implementations have a light-weight thread system that schedules logical threads on the available operating system threads. These light and cheap threads can be created with forkIO. (We won't discuss full OS threads which are created via forkOS, as they have significantly higher overhead and are only useful in a few situations like in FFIs.)

forkIO ::IO()->IO ThreadId

Lets take a simple Haskell application that hashes two files and prints the result:

Now we have a rough program with great performance boost - which is expected given the trivially parallel computation.

But wait! You say there is a bug? Two, actually. One is that if the main thread is finished hashing fileB first, the program will exit before the child thread is done with fileA. The second is a potential for garbled output due to two threads writing to stdout. Both these problems can be solved using some inter-thread communication - we'll pick this example up in the MVar section.

2.1 Further reading

3 Synchronisation with locks

Locking mutable variables (MVars) can be used to great effect not only for communicating values (such as the resulting string for a single function to print) but it is also common for programmers to use their locking features as a signaling mechanism.

MVars are a polymorphic mutable variables that might or might not contain a value at any given time. Common functions include:

will block until the current MVar is empty. Taking an MVar will leave the MVar empty when returning the value.
In the

forkIO

example we developed a program to hash two files in parallel and ended with a couple small bugs because the program terminated prematurely (the main thread would exit when done). A second issue was that threads can conflict with each others use of stdout.

Lets now generalize the example to operate on any number of files, block until the hashing is complete, and print all the results from just one thread so no stdout garbling occurs.

- remember this function blocks when the MVar is already full so no hashes are dropped on account of the mutable memory. Similarly,

printNrResults

uses the

takeMVar

function which will block until the MVar is full - or once the next file is done being hashed in this case.
Note how the value is evaluated before the putMVar call. If the argument is an unevaluated thunk then

printNrResults

will have to evaluate the thunks before it prints the result and our efforts would have been worthless.
Knowing the

str

MVar will be filled '

length files

' times we can let the main thread exit after printing the given number of results, thus terminating the program.

3.1 Further reading

4 Message passing channels

For streaming data it is hard to beat the performance of channels. After declaring a channel (

newChan

), you can pipe data between threads (

writeChan

,

readChan

) and tee data to separate readers (

dupChan

). The flexibility of channels makes them useful for a wide range of communications.

Continuing with our hashing example, lets say the names of the files needing hashed are coming available, or need streaming for other reasons. We can fork a set of worker threads and feed them the filenames through a channel. For consistancy, the program has also been modified to communicate the result from worker to printer via a channel.