Compiling and Running Codes

The aim of this practical tutorial is to ensure that users can compile and run different types of programs on ARC2.

The first part of the tutorial involves compiling and running a set of simple hello world type programs.

The second set of exercises involves compilation and execution of simple matrix vector multiplication code using different compiler options, introducing different optimisation levels and the -fast macro. This code is also linked to the Basic Linear Algebra Subroutine (BLAS) library.

You can download the exercises as a zip file, practicals.zip or as a tarred and gzipped file practicals.tar.gz. (right click on the link and save the file, then open the folder that the file is in). These files were also used in the web page Using Linux the Basics.

The example code is provided in both C and Fortran, please the choose the language you are more comfortable with.

Note about modules

The
modules command is installed on the system in order that several compilers and their corresponding libraries can co-exist on the system. In addition there are numerous software applications (see this list of applications) available to load via the module command. When a module is loaded/unloaded the user environment is altered in order that the desired software can be used. To check what current modules are loaded, the command:

1

2

3

$module list

can be issued at any time. To see the complete list of modules, together with a brief description use:

1

2

3

$module whatis

to view a list of all available modules, with no descriptions use:

1

2

3

$module avail

Modules can be loaded with:

1

2

3

$module load<module name>

unloaded with:

1

2

3

$module unload<module name>

To switch between the currently loaded collection of modules and an alternative set of compiler-specific modules,
use:

1

2

3

$module switch<current_compiler><desired_compiler>

e.g.

1

2

3

$module switchintel pgi

will switch to using the PGI compiler from the currently loaded environment which uses the Intel compiler.

Basic compilation using the Intel compiler

This is a simple exercise to introduce you to the compilation of C/C++ and Fortran 77/90 programs. The example code can be found in
~/GS/compiling . The Intel compiler is the loaded by default when you login to ARC2. If you have switched the to the PGI compiler in that last section, you can switch back to the Intel compiler through the
module switch command:

1

2

3

$module switchpgi intel

C

The C code can be found in the file
hello.c . To compile it using the Intel compiler issue the command:

1

2

3

$icc-ohello hello.c

This will produce an output file called
hello . To run it, issue the command:

1

2

3

./hello

It should produce the following output:

1

2

3

You have successfully compiled andrunaCprogram!

C++

Example C++ code can be found in the file
greetings.cc . To compile it, issue the command:

1

2

3

$icpc-ogreetings greetings.cc

This will produce an output file called greetings. To run it, issue the command:

1

2

3

$./greetings

Fortran 90/77:

A Fortran 90 example is included in the file easy.f90. Compile and run this by issuing the commands:

1

2

3

4

$ifort-oeasy easy.f90

$./easy

A Fortran 77 version can be found in the file simple.f. Invoke the Fortran 77 compiler using:

1

2

3

$ifort-osimple simple.f

Basic compilation using the Portland group compilers

To set the PGI compiler up you must first switch your environment using the
module command:

1

2

3

$module switchintel pgi

The PGI compiler can now be used to compile the above programs using:
pgcc for C,
pgCC for C++,
pgf90 for Fortran 90 and
pgf77 for Fortran 77. Repeat the exercise above with the PGI compiler:

i.e. to compile the C code:

1

2

3

$pgcc-ohello hello.c

to compile the C++ code:

1

2

3

$pgCC-ogreetings greetings.cc

to compile the Fortran 90 code:

1

2

3

$pgf90-oeasy easy.f90

and for Fortran 77:

1

2

3

$pgf77-osimple simple.f

To switch your environment back to Intel, issue the command:

1

2

3

$module switchpgi intel

Basic compilation using the GNU compilers

The GNU compilers (
gcc (C),
g++ (C++) ,
g77 (Fortran 77) and
gfortran (Fortran 90) are also available on the system. Although the operating system native version of the GNU compilers are normally included in your
PATH , it is best to load the module with the latest version of this compiler so that any corresponding libraries are available in your environment:

1

2

3

$module switchintel gnu/4.8.1

Then, to compile the C code:

1

2

3

$gcc-ohello hello.c

to compile the C++ code:

1

2

3

$g++-ogreetings greetings.cc

to compile the Fortran 77 code:

1

2

3

$g77-osimple simple.f

and for Fortran 90:

1

2

3

$gfortran-oeasy easy.f90

Compiler Flags

In this exercise, different compiler flags are introduced and the performance of a simple (matrix * vector) code is analysed. The code can be found in
~/GS/*/flags where
* represents the language of your choice, i.e. C or Fortran.

The code performs a (matrix * vector) operation in three different ways: Looping over columns in the inner most loop, looping over matrix rows in the inner most loop or using the library routine DGEMV.

Please work with either the C or Fortran codes depending upon which compiler you will use the most.

Note about numerical libraries

There are several versions of numerical libraries installed on the system. Currently, there are four versions, Intel’s Maths Kernel Library (MKL) library, AMD’s Core Maths Library (ACML), Automatically Tuned Linear Algebra Software (ATLAS) and the original Netlib. All these libraries, aside from Netlib, are optimised to run on the available hardware. To load a specific library, e.g. MKL:

1

2

3

$module load mkl

to switch to another version of the libraries, e.g. ACML:

1

2

3

$module switchmkl acml

Initial Fortran compilation

Intel compiler

As a first step, simply compile the code as in the first exercise, i.e.:

1

2

3

$ifort-omatmul matmul.f90

This will then give the error that the BLAS DGEMV routine cannot be found. Several versions of this library are installed on the system. After loading one of these libraries, see section 2.1 above, you can correctly link to it using the customised environmental variable
$ARC_LINALG_FFLAGS .

For example to load the MKL library use:

1

2

3

$module load mkl

and to compile the code use:

1

2

3

$ifort-omatmul matmul.f90$ARC_LINALG_FFLAGS

You can switch to another version of the numerical libraries and use the same compile line.

In each case once the program has successfully compiled, run the executable by typing its name on the command line, comparing the execution times:

1

2

3

$./matmul

The program will print out the size of the problem, memory size used and timing (in seconds) of each of the
sections. A calculation of performance, expressed in megaflops is also printed.

To obtain global timing information, the time command can be used when running the code:

1

2

3

$time./matmul

You can switch to another version of the numerical libraries and use the same compile line and execute the result in the same way.

At this stage, how does the performance of the different ways the calculation is performed compare? How does
this change with problem size?

Portland group compiler

This exercise can be repeated using the PGI compiler. First change the modules loaded to switch to using this
compiler:

1

2

3

$module switchintel pgi

You can then compile the code, while linking to the loaded numerical library of your choice, using:

1

2

3

pgf90-omatmul matmul.f90$ARC_LINALG_FFLAGS

Run the executable by typing the file name on the command line:

1

2

3

$./matmul

Initial C compilation

Intel compiler

As an initial step, simply compile the code as in the first exercise, i.e:

1

2

3

$icc-omatmul matmul.c

This will then give the error that the BLAS DGEMV routine cannot be found. Several versions of the BLAS library are installed on the system. After loading one of these libraries, see section 2.1 above, you can correctly link to it using the customised environmental variable
$ARC_LINALG_CFLAGS .

For example to load the MKL library use:

1

2

3

$module load mkl

and to compile the code use:

1

2

3

$icc-omatmul matmul.c$ARC_LINALG_CFLAGS

Once the program has successfully compiled, run the executable by typing its name on the command line, comparing the execution time:

1

2

3

$./matmul

The program will print out the size of the problem, memory size used and timing (in seconds) of each of the sections. A calculation of performance, expressed in megaflops is also printed.

To obtain global timing information, the time command can be used when running the code:

1

2

3

$time./matmul

At this stage, how does the performance of the different ways the calculation is performed compare? How does this change with problem size?

PGI compiler

This exercise can be repeated using the PGI compiler. First change the modules loaded to switch to using this
compiler:

1

2

3

$module switchintel pgi

As before use one of the available libraries, for e.g. to use MKL, if you have not already loaded it, issue the command:

1

2

3

$module load mkl

and to compile the code use:

1

2

3

$pgcc-omatmul matmul.c$ARC_LINALG_CFLAGS

Run the executable by typing the file name on the command line:

1

2

3

$./matmul

Optimisation

Until now, we have not allowed the compiler to optimise the code at all. There are many options which can be experimented with in order to get the most out of your code. A small subset of these are introduced here for the Intel and PGI compilers:

Intel compilers

By default the Intel compiler uses the optimisation flag-O2 flag . This can be turned off by using
-O0 flag, for no optimisations. e.g. for Fortran:

1

2

3

ifort-O0-omatmul matmul.f90$ARC_LINALG_FFLAGS

or for C:

1

2

3

$icc-O0-omatmul matmul.c$ARC_LINALG_CFLAGS

Now substitute
-O for more aggressive optimisations with
-O3 . How does the runtime of the code differ with these options?

The
-fast flag includes a combination of optimisation options which in general improve the runtime of code. There are two architecture specific optimisation options available
-xSSE4.2 , which produces code specifically optimised for the current architecture and
-axSSE4.2 , which produces the specialised code and generic code to run on other processors. Experiment with these flags to see how performance of the code is alters.

Portland Group compilers

For the PGI compiler add the
-O flag for default optimisations, e.g. for Fortran:

1

2

3

$pgf90-O-omatmul matmul.f90$ARC_LINALG_FFLAGS

or for C:

1

2

3

$pgcc-O-omatmul matmul.c$ARC_LINALG_CFLAGS

Now increase the optimisation level to 3
-O3 and observe how this affects the performance. There is also a
-fastsse flag which allows specific instructions for the current architecture to be included. Experiment with these options to see how
performance is affected.