To compile a Haskell program for parallel execution under PVM, use the
-parallel option, both when compiling and
linking. You will probably want to import
Parallel into your Haskell modules.

To run your parallel program, once PVM is going, just invoke it
“as normal”. The main extra RTS option is
-N<n>, to say how many PVM
“processors” your program to run on. (For more details of
all relevant RTS options, please see Section 3.11.4.)

In truth, running Parallel Haskell programs and getting information
out of them (e.g., parallelism profiles) is a battle with the vagaries of
PVM, detailed in the following sections.

With Parallel Haskell programs, we usually don't care about the
results—only with “how parallel” it was! We want pretty pictures.

Parallelism profiles (à la hbcpp) can be generated with the
-q RTS option. The
per-processor profiling info is dumped into files named
<full-path><program>.gr. These are then munged into a PostScript picture,
which you can then display. For example, to run your program
a.out on 8 processors, then view the parallelism profile, do:

The “garbage-collection statistics” RTS options can be useful for
seeing what parallel programs are doing. If you do either
+RTS -Sstderr or +RTS -sstderr, then
you'll get mutator, garbage-collection, etc., times on standard
error. The standard error of all PE's other than the `main thread'
appears in /tmp/pvml.nnn, courtesy of PVM.

Besides the usual runtime system (RTS) options
(Section 3.12), there are a few options particularly
for concurrent/parallel execution.

-N<N>:

(PARALLEL ONLY) Use <N> PVM processors to run this program;
the default is 2.

-C[<s>]:

Sets
the context switch interval to <s> seconds.
A context switch will occur at the next heap block allocation after
the timer expires (a heap block allocation occurs every 4k of
allocation). With -C0 or -C,
context switches will occur as often as possible (at every heap block
allocation). By default, context switches occur every 20ms
milliseconds. Note that GHC's internal timer ticks every 20ms, and
the context switch timer is always a multiple of this timer, so 20ms
is the maximum granularity available for timed context switches.

-q[v]:

(PARALLEL ONLY) Produce a quasi-parallel profile of thread activity,
in the file <program>.qp. In the style of hbcpp, this profile
records the movement of threads between the green (runnable) and red
(blocked) queues. If you specify the verbose suboption (-qv), the
green queue is split into green (for the currently running thread
only) and amber (for other runnable threads). We do not recommend
that you use the verbose suboption if you are planning to use the
hbcpp profiling tools or if you are context switching at every heap
check (with -C).

-t<num>:

(PARALLEL ONLY) Limit the number of concurrent threads per processor
to <num>. The default is 32. Each thread requires slightly over 1K
words in the heap for thread state and stack objects. (For
32-bit machines, this translates to 4K bytes, and for 64-bit machines,
8K bytes.)

-d:

(PARALLEL ONLY) Turn on debugging. It pops up one xterm (or GDB, or
something…) per PVM processor. We use the standard debugger
script that comes with PVM3, but we sometimes meddle with the
debugger2 script. We include ours in the GHC distribution,
in ghc/utils/pvm/.

-e<num>:

(PARALLEL ONLY) Limit the number of pending sparks per processor to
<num>. The default is 100. A larger number may be appropriate if
your program generates large amounts of parallelism initially.

-Q<num>:

(PARALLEL ONLY) Set the size of packets transmitted between processors
to <num>. The default is 1024 words. A larger number may be
appropriate if your machine has a high communication cost relative to
computation speed.