Par can be used for specifying pure parallel computations in
which the order of the computation is not known beforehand.
The programmer specifies how information flows from one
part of the computation to another, but not the order in which
computations will be evaluated at runtime. Information flow is
described using variables called IVars, which support put and
get operations. For example, suppose you have a problem that
can be expressed as a network with four nodes, where b and c
require the value of a, and d requires the value of b and c:

The result of the above computation is always 9. The get operation
waits until its input is available; multiple puts to the same
IVar are not allowed, and result in a runtime error. Values
stored in IVars are usually fully evaluated (although there are
ways provided to pass lazy values if necessary).

In the above example, b and c will be evaluated in parallel.
In practice the work involved at each node is too small here to see
the benefits of parallelism though: typically each node should
involve much more work. The granularity is completely under your
control - too small and the overhead of the Par monad will
outweigh any parallelism benefits, whereas if the nodes are too
large then there might not be enough parallelism to use all the
available processors.

Unlike Control.Parallel, in Control.Monad.Par parallelism is
not combined with laziness, so sharing and granulairty are
completely under the control of the programmer. New units of
parallel work are only created by fork and a few other
combinators.

The default implementation is based on a work-stealing scheduler
that divides the work as evenly as possible between the available
processors at runtime. Other schedulers are available that are
based on different policies and have different performance
characteristics. To use one of these other schedulers, just import
its module instead of Control.Monad.Par:

put a value into a IVar. Multiple puts to the same IVar
are not allowed, and result in a runtime error.

put fully evaluates its argument, which therefore must be an
instance of NFData. The idea is that this forces the work to
happen when we expect it, rather than being passed to the consumer
of the IVar and performed later, which often results in less
parallelism than expected.