MPI and OpenMP user guide

This is a short tutorial about how to use the queuing system, and how to compile and run MPI and OpenMP jobs.

Compiling and running parallel programs on UPPMAX clusters.

Introduction

These notes show by brief examples how to compile and run serial and parallel programs on the clusters at UPPMAX, such as milou.uppmax.uu.se and tintin.uppmax.uu.se

Section 1 show how to compile and run serial programs, written in fortran, c, or java, on the login nodes. Things work very much like on any unix system, but the subsections on c and java also demonstrate the use of modules.

Section 2 show how to run serial programs on the execution nodes by submitting them as batch jobs to the queue system SLURM.

Serial programs on the login node

Fortran programs

To compile this you should decide on which compilers to use. At UPPMAX there are three different compilers installed gcc (gfortran), intel and Portland.

For ths example we will use Portland group compilers installed on UPPMAX, so the pgf77 or pgf90 command can be used to compile fortran code. (pgf90 is in fact a F95 compiler). A module must first be loaded to use the compilers:

$ module load pgi/18.3

To compile, enter the command:

$ pgf90 -o hello hello.f

to run, enter:

$ ./hello
hello, world

To compile with good optimization you can use the -fast flag to the compiler, but be a bit careful with the -fast flag, since sometimes the compiler is a bit overenthusiastic in the optimization and this is especially true if your code contains programming errors (which if you are responsible for the code ought to fix, but if this is someone elses code your options are often more limited). Should -fast not work for your code you may try with -O3 instead.

Before compiling a program for MPI we must choose wich version of MPI. At UPPMAX there are two, openmpi and intelmpi. For this example we will use openmpi.
To load the openmpi module, enter the command:

$ module load pgi/18.3 openmpi/3.1.3

To check that the openmpi modules is loaded, use the command:

$ module list

The command to compile a c program for mpi is mpicc. Which compiler is used when this command is issued depends on what compiler module was loaded before openmpi

To compile, enter the command:

$ mpicc -o hello hello.c

You should add optimization and other flags to the mpicc command, just as you would to the compiler used. So if the pgi compiler is used and you wish to compile an mpi program written in C with good, fast optimization you should use a command similar to the following:

The last line in the script is the command used to start the program.
The last word on the last line is the program name hello.

Submit the job to the batch queue:

$ sbatch hello.sh

The program's output to stdout is saved in the file named at the -o flag.
A test run of the above program yelds the following output file:

$ cat hello.out
From process 4 out of 8, Hello World!
From process 5 out of 8, Hello World!
From process 2 out of 8, Hello World!
From process 7 out of 8, Hello World!
From process 6 out of 8, Hello World!
From process 3 out of 8, Hello World!
From process 1 out of 8, Hello World!
From process 0 out of 8, Hello World!

Fortran programs

The following example program does numerical integration to find Pi (inefficiently, but it is just an example):

I am node 8 out of 8 nodes.
start is 87499999
end is 99999999
I am node 3 out of 8 nodes.
start is 24999999
end is 37499999
I am node 5 out of 8 nodes.
start is 49999999
end is 62499999
I am node 2 out of 8 nodes.
start is 12499999
end is 24999999
I am node 7 out of 8 nodes.
start is 74999999
end is 87499999
I am node 6 out of 8 nodes.
start is 62499999
end is 74999999
I am node 1 out of 8 nodes.
start is 0
end is 12499999
I am node 4 out of 8 nodes.
start is 37499999
end is 49999999
Result from node 8 is 4.0876483237300587E-002
Result from node 5 is 0.1032052706959522
Result from node 2 is 0.1226971551244773
Result from node 3 is 0.1186446918315650
Result from node 7 is 7.2451466712425514E-002
Result from node 6 is 9.0559231928350928E-002
Result from node 1 is 0.1246737119371059
Result from node 4 is 0.1122902087263801
Result of integration is 0.7853982201935574
Estimate of Pi is 3.141592880774230
Warning: ieee_inexact is signaling
FORTRAN STOP
Warning: ieee_inexact is signaling
FORTRAN STOP
Warning: ieee_inexact is signaling
FORTRAN STOP
Warning: ieee_inexact is signaling
FORTRAN STOP
Warning: ieee_inexact is signaling
FORTRAN STOP
Warning: ieee_inexact is signaling
FORTRAN STOP
Warning: ieee_inexact is signaling
FORTRAN STOP
Warning: ieee_inexact is signaling
FORTRAN STOP

The program's output to stdout is saved in the file named at the -o flag.
A test run of the above program yelds the following output file:

$ cat hello.out
q33.uppmax.uu.se
unlimited
From thread 0 out of 8, hello, world
From thread 1 out of 8, hello, world
From thread 2 out of 8, hello, world
From thread 3 out of 8, hello, world
From thread 4 out of 8, hello, world
From thread 6 out of 8, hello, world
From thread 7 out of 8, hello, world
From thread 5 out of 8, hello, world

The last line in the script is the command used to start the program.
Submit the job to the batch queue:

$ sbatch hello.sh

The program's output to stdout is saved in the file named at the -o flag.
A test run of the above program yelds the following output file:

$ cat hello.out
q33.uppmax.uu.se
From thread 0 out of 8: hello, world
From thread 4 out of 8: hello, world
From thread 5 out of 8: hello, world
From thread 6 out of 8: hello, world
From thread 7 out of 8: hello, world
From thread 1 out of 8: hello, world
From thread 2 out of 8: hello, world
From thread 3 out of 8: hello, world