Many of the paths are symbolic links. The actual paths sometimes change for
minor bug fixes and other maintenance. For MPICH and IBM's MPI, there are scripts
that link in the necessary libraries and include directories. When accessing
the library files and include directories explicitly, care must be taken that
the -I and -L paths
are consistent with the MPI used. Users are encouraged to use the standard MPICH
or IBM scripts as they will automatically provide all required macro definitions,
environment settings, include and library paths, and platform-specific libraries.

The vendor MPI implementations and Argonne's MPICH are different. Most vendor
libraries yield faster communication; therefore, the vendor libraries are generally
recommended over MPICH. However, maintaining compilability with MPICH can often
simplify debugging. Because the include files (in particular, mpi.h) may
be different, a complete recompile, as well as reload, may be necessary when
switching between a vendor library and MPICH. For example, the Compaq and Quadrics
MPIs are MPICH-based and appear to be compatible with only a reload, provided
no MPI-2 features are being used; your mileage may vary, of course. Care should
be taken when building applications using MPI with libraries that also use MPI
to guarantee that consistent MPI implementations are used by both.

C++ support for most of the MPIs is limited to C++ compatibility mode, so
that C++ codes may invoke the MPI C routines. The MPI-2 standard defined an MPI
class and bindings and the class definitions developed by Notre Dame are slowly
being incorporated in MPICH and vendor MPIs.

The MPICH configurations on most of our systems are still currently limited
to C++ compatibility mode. The MPI-2 C++ interfaces are included with the MPICH
releases starting with version 1.2.4. The C++ interfaces with MPICH 1.2.4 are
only available on the IBM SPs. The MPI-2 C++ interfaces are also available on
the IBM SPs with AIX 5.1 and PSSP 3.2 and above.

Two variations of the IBM MPI library are available, a threaded
library and a signal library. (Note:
The signal MPI library does not work on Power4 or Power 5 systems.)
The threaded library processes MPI calls in a separate, kernel-bound thread,
while the signal library uses interrupts to ensure progress of MPI calls. The
threaded library is thread-safe and is the default library used, whether or not
the thread-safe compiler scripts (e.g., mpcc vs. mpcc_r) are used.

Note that the signal library yields slightly faster communication, but the
compiled code is not thread-safe. Performance of the threaded library is comparable
to that of the signal library if all MPI calls are performed in a single user
thread and the environment variable MP_SINGLE_THREAD=yes.

In order to link with the signal library, you cannot use the thread-safe compiler
scripts (e.g., mpcc_r) and you must set the environment variable LLNL_COMPILE_SINGLE_THREADED=TRUE.

Both libraries support two communication methods: User Space (US) and Internet
Protocol (IP). US is an IBM OS bypass mechanism that provides user processes
with fast access to communication hardware. Latencies with IP are about a factor
of 5 higher than US. Current configurations limit the number of US processes
per node to the number of cpus per node.

The IBM MPI libraries will use shared memory for on-node communication and
the network interface for all off-node communication. The shared memory communication
is enabled by setting the environment variable MP_SHARED_MEMORY=yes, which is
the default setting in our current system configuration. Using the shared memory
configuration provides slightly faster on-node communication at the cost of higher
CPU overhead. Its impact on performance depends on the MPI calls that are used;
generally, it will improve performance for blocking MPI calls, while codes that
use nonblocking MPI calls can see performance degradation. Shared memory communication
can be disabled by setting MP_SHARED_MEMORY=no.

The signal MPI library can be used with C, C++, Fortran77, Fortran90, or Fortran95
codes. Note: The signal MPI library does not work on Power4 or Power5
systems.

The following examples show that for both the compilation and load steps,
the nonthreaded compile script must be used with the setting of the environment
variable LLNL_COMPILE_SINGLE_THREADED=TRUE. In this case, the mp* scripts
are not equivalent to the _r versions.

The resulting executable may be run with poe, using environment variables
or command-line arguments to set job parameters.

There are many environment variables that
affect the performance tuning of the IBM MPI. The default user environment sets
MP_EUILIB=us, MP_SHARED_MEMORY=yes, and a few other environment variables most
often needed for MPI or other parallel programs. You may find additional settings,
or overriding of the defaults, are necessary for optimal performance for some
codes.

Execution Line

To execute your code with n nodes and p processes in the indicated pool with
US communications, use the environment settings as shown in the following example:

We support three versions of MPICH: a default version, a latest installed
version, and an oldest supported version. All versions may be used with C, C++,
Fortran77, or Fortran90 through MPICH compilation and run scripts that are accessed
through symbolic links in /usr/local/bin, as described in the examples
below. All versions are installed in /usr/local because it is assumed
that users will commonly have /usr/local/bin included in their PATH environment
variable.

The most stable recent version of MPICH is the default. The default version
of MPICH is installed as /usr/local/mpi and is accessed through links
in /usr/local/bin to the standard MPICH compilation and run scripts.
The compilation scripts are mpicc, mpiCC, mpif77, and mpif90,
and the run script is mpirun, which is used to execute programs created
using the compilation scripts.

The latest version of MPICH is installed as /usr/local/new_mpi, and
there are links in /usr/local/bin to its corresponding scripts. The compilation
scripts are new_mpicc, new_mpiCC, new_mpif77, and new_mpif90,
and new_mpirun is used to execute programs built with those scripts. When
a new MPICH release becomes available, the previous latest release will become
the default, and the new latest release will be installed as new_mpi.

The oldest version of MPICH is installed as /usr/local/old_mpi, and
there are links in /usr/local/bin to its corresponding scripts. The compilation
scripts are old_mpicc, old_mpiCC, old_mpif77, and old_mpif90,
and old_mpirun is used to execute programs built with those scripts. When
a new version of MPICH becomes the default version, the previous default becomes old_mpi.

Users should be aware that the installed versions of MPICH can vary across
platforms or machines. In general, we try to keep the versions consistent, but
there can be a lag in migrating a new version to all systems because of programmatic
requests. Other than very short lags to update links across the full set of machines,
these versions will be consistent across machines of the same platform, and,
usually, across platforms.

On the Intel Linux Cluster we only support a default version, so the remarks
here about old_ and new_ versions do not apply there.

We install the best MPICH abstract device interface (ADI) available for each
platform. On the IBM's, this is the MPL device, which
is able to interface to poe and to make use of the SP switch in both US
and IP mode.

As stated above, /usr/local/bin contains soft links to the MPICH scripts
for all the currently supported versions. The standard MPICH script names are
linked to the default MPICH path. For example, /usr/local/bin/mpicc is
a link to /usr/local/mpi/bin/mpicc. Other MPICH script names in /usr/local/bin are
links to the additional MPICH versions that are supported, using the prefixes old_ and new_,
so that these names are derived from the standard MPICH script names.

We use symbolic links so that different names can distinguish the different
versions installed, because all MPICH versions provide the same script names,
relative to their installation paths. For example, new_mpicc is a link
to /usr/local/new_mpi/bin/mpicc, while mpicc is a link to /usr/local/mpi/bin/mpicc.
Please note that the scripts for each version do differ, and cannot be used interchangeably;
e.g., you cannot use mpirun to execute a program built with the old_mpicc script.

We have made site-specific modifications to the MPICH scripts in some cases.
On the IBM SPs, the compilation scripts will automatically set the environment
variable LLNL_COMPILE_SINGLE_THREADED=TRUE to prevent unintentional mixing of
IBM's threaded MPL library with MPICH definitions.

The MPICH compilation scripts add configuration-specific macro definitions
and automatically set the appropriate include directories and link in the appropriate
libraries. Users are discouraged from accessing the MPI libraries and include
files explicitly; they are subject to change with new versions of MPICH, and
path names and MPI support libraries needed vary by platform. If explicit paths and
libraries are required, consult the information in /usr/local/docs/MPI_Use_Summary on
the platform you are using for more details on the paths and libraries needed.

Each MPICH compilation script is configured to use a specific C, C++, or
Fortran compiler, typically the native compiler on the given platform. MPICH
allows the user to change the compiler and linker/loader used by these scripts
by defining appropriate environment variables, as described below. Note that
you generally use the same command for both the compiler and the linker/loader,
which requires setting a pair of MPICH environment variables (e.g., MPICH_CC=gcc and
MPICH_CLINKER=gcc).

MPICH_CC
MPICH_CLINKER

alternate C compiler
alternate C loader

MPICH_CCC
MPICH_CCLINKER

alternate C++ compiler
alternate C++ loader

MPICH_F77
MPICH_F77LINKER

alternate Fortran77 compiler
alternate Fortran77 loader

MPICH_F90
MPICH_F90LINKER

alternate Fortran90 compiler
alternate Fortran90 loader

To determine what other definitions or paths are provided by the version of
MPICH you are using, you may use the -compile_info or -link_info options
to any of the MPICH compilation scripts such as mpicc to see the options
used by these scripts. This can assist you in providing the MPICH-required options
in your compile and link commands, if you are not using the MPICH scripts, to
guarantee compatibility with MPICH.

Executables built with the oldest or newest versions of MPICH should be run
using the corresponding old_mpirun or new_mpirun, respectively,
as there could be subtle differences in the runtime environments created by each.

Mellanox/OSU MVAPICH MPI is based on MPICH 1.2.7. Currently, there is no
support for MPI one-sided communications.

MPI compiler wrapper scripts are available in /usr/local/bin/, which is in
the default $PATH. These scripts mimic the familiar MPICH scripts in their functionality,
meaning they automatically include the appropriate MPI include files and link
to the necessary MPI libraries and pass switches to the underlying compiler.

Type [scriptname] -help for a list
of command-line options. Scripts available are:

On IBM SP platforms, MPICH uses IBM's proprietary Message Passing Library
(MPL), which supports both US and IP communication (see the IBM
MPI section). Because no thread-safe version of IBM's MPL exists, MPICH cannot
use the _r compilers. MPICH users must therefore set LLNL_COMPILE_SINGLE_THREADED=TRUE
on IBM machines. Failure to have this environment variable set can result in
missing externals at load time or inappropriate mixing of MPICH and IBMs MPI
definitions that can generate illegal/bad communicator errors at run time. This
environment setting is automatically set in the MPICH scripts, but it must be
added explicitly if you are not using the scripts.

Note that the MPL MPICH mpirun provides SMP support. By default it
will use n = ceiling (p/4) nodes,
placing up to 4 tasks on each node, where p is
the number of processes requested. The -nodes n option
will override this behavior, where the desired number of nodes to use is n.
MPL MPICH will distribute the p tasks
evenly across the n nodes, or complain
if it cannot evenly distribute the tasks. Note that mpirun also understands
several IBM environment variables, such as MP_NODES and MP_TASKS_PER_NODE to
determine the number of nodes to use, but these must be consistent with the -np option
used on the mpirun command. The default number of processes used if no -np option
is specified is 1.

The following examples demonstrate how to use the MPICH scripts on the IBM
SPs. Most of the examples use the default script names, but the oldest or newest
versions of MPICH supported are also available with the old_ or new_-prefixed
names as indicated.

The resulting executable is run with mpirun, old_mpirun, or new_mpirun,
as appropriate. Although MPICH-compiled executables can generally be run as serial
jobs on most platforms, it is strongly recommended that MPL MPICH jobs be run
only through the mpirun scripts.

Need $OMPI/bin in $PATH, $OMPI/lib in $LD_LIBRARY_PATH.
The ompi_info command lists various configuration settings,
compilers used, available components, and more.

Compiler wrappers are provided. Users should never need to explicitly link
against any OMPI libraries.

C Example

mpicc code.c -o code

C++ Example

{mpiCC, mpic++, mpicxx} code.C -o code

Fortran 77 Example

mpif77 code.f -o code F90: mpif90 code.F -o code

By default, Open MPI will use the fastest networks available. On a Peloton
system with InfiniBand, for example, shared memory will be used for communication
between processes on the same node, and InfiniBand will be used for communication
across nodes. A single-node MPI job will use shared memory by default. A multinode
MPI job across nodes without InfiniBand will use TCP for communication. Networks
can be explicitly selected via an MCA parameter, which will be discussed below.

Running Open MPI under SLURM

Within a batch script or interact allocation, Open MPI automatically detects
how many nodes are available and how many cores each node has. For example, in
a two-node allocation on Atlas (8 cores per node):

$ mpirun ./hello
atlas34 is rank 0 of 16
atlas34 is rank 1 of 16
atlas34 is rank 2 of 16
atlas34 is rank 3 of 16
atlas34 is rank 7 of 16
atlas34 is rank 4 of 16
atlas34 is rank 5 of 16
atlas34 is rank 6 of 16
atlas35 is rank 8 of 16
atlas35 is rank 9 of 16
atlas35 is rank 10 of 16
atlas35 is rank 11 of 16
atlas35 is rank 12 of 16
atlas35 is rank 13 of 16
atlas35 is rank 14 of 16
atlas35 is rank 15 of 16

Note that adjacent ranks are grouped onto the same node. The number of processes
may be explicitly specified with the -np parameter.
Also, ranks may be assigned in a round-robin fashion across available nodes using
the -bynode parameter:

Open MPI supports for runtime configuration of via MCA parameters.
MCA parameters may be specified on the command line, in the shell environment,
and/or in per-user and per-OMPI-installation configuration files. More information
can be found in the OMPI FAQ:

As mentioned earlier, one way in which MCA parameters are useful is to select
which network interconnects are used for communication. By default, the fastest
network available is used; however, manually selecting e.g. TCP may be useful
for debugging purposes.

Use shared memory and InfiniBand

mpirun -mca btl openib,sm,self ./hello

Use only InfiniBand, no shared memory (even within one node)

mpirun -mca btl openib,self ./hello

Use only TCP

mpirun -mca btl tcp,self ./hello

Several parameters are useful for running large-scale jobs. These include:

Increases the InfiniBand transmit timeout, which significantly reduces the
occurrence of code 12 errors from the InfiniBand network.

mpi_preconnect_all 1

Establishes TCP management connections and MPI-level communication connections
between all MPI processes during initialization. Generally not needed, though
may help with some applications that communcate between every process in the
MPI job.