Tutorial of Parallel Computing

OpenMP

Fortran and C OpenMP example codes can be found in
~/GS/*/OpenMP , where
* represents the compiler of your choice (either Fortran or C). The program takes an image that has undergone edge detection and reconstructs the original image iteratively. The program is initially set to perform 1000 iterations. Compile and run the code using full optimisation:

For the Intel compiler, for Fortran use:

1

2

3

$ifort-O2-xSSE4.2-openmp-oimage image.f90

or for C use:

1

2

3

$icc-O2-xSSE4.2-openmp-oimage image.c

Time the execution of this code on a single processor, launching under time. A measure of convergence is given as the final residue figure approaches zero. The reconstructed image is placed into the file
finalimage.pgm which can be viewed with the graphics program
display .

The number of parallel threads is set through the
$OMP_NUM_THREADS variable. Increase the number of parallel threads to 2 and rerun the code. To set the number of threads to 2 via the environmental variable use:

1

2

3

$export OMP_NUM_THREADS=2

and to unset/delete the variable use:

1

2

3

$unset OMP_NUM_THREADS

NOTE: When running the Intel compiled Fortran code on the login node, you may get a Segmentation fault. This is due to the way that the compiler handles the stack when using OpenMP. You can resolve this issue by setting the stack to an unlimited value, using
ulimit-sunlimited on the command line. This does not happen when submitting via the queues as unlimited is the default value of the stack on the computational nodes.

Use this script to obtain accurate timing information for running this code on 1, 2, 4, 6 and 8 CPU cores for both the Intel and Portland group compilers.

MPI

MPI is available via wrapper scripts which call the relevant compiler, together with necessary include files and library calls. There are different wrapper scripts available depending upon the choice of compiler and MPI library.

Compiling MPI

All compiler options applicable to the compiler being invoked are available to the wrapper scripts. Use the MPI wrappers to compile the code and link with the standard MPI library; for Intel Fortran:

1

2

3

$mpif90-opip pip.f90

or for C:

1

2

3

$mpicc-opip pip.c

To launch the code, use the
mpirun launcher. This takes an option
-np<n> , where
n is the number of processes to be launched. Execute and time the code for 1,2,4 processes; e.g. for 2 processes use:

1

2

3

mpirun-np2./pip

The output from this program will be placed into the file
output.pgm which can be viewed with the
display command. E.g.:

1

2

3

display output.pgm

To use the PGI compilers, first switch the PGI module
module switchintel pgi . Then for PGI Fortran use:

1

2

3

$mpif90-fastsse-opip pip.f90

or for C:

1

2

3

$mpicc-fastsse-opip pip.c

You can then run the program in the same way as described above

MPI Job Submission

The system is set up, such that the job will placed in an optimal fashion by default. This means that the cores will be selected from the available resources in such a way to minimise the communication hops between the different processes. This will have the effect of reducing latency and should improve program performance.