You will most probably, at some point, face a situation where you
need additional software installed. One option is to ask the System
Administrator to install the software globally on the cluster, which
makes sense mainly for popular software used by many users.

Another option is to install the software locally in your home
directory. Most of the time, this requires installing the software from
the sources. If you do not have access to the sources, as is the case
with many commercial software packages, you will need to check the
documentation of the installation procedure to know how to install the
software in a custom directory.

First of all, read any README, or INSTALL file that you could find in
the directory.

Then, run the ./configure script. That script will analyse the available
software (compilers, GNU tools, etc.) that are needed for the
compilation process, and prepare the subsequent compilation scripts.

At this stage, you will choose the directory to which the software must
be installed. That is done with the –prefix option of the configure
script. For example:

$ ./configure --prefix ~/.local/

You can also choose the compiler and compiler options with environment
variables passed to ./configure, e.g. to use the Intel compiler: CC=icc
./configure. Other interesting variables include CFLAGS, CPPFLAGS,
LDFLAGS. Run ./configure –help for a detailed list.

Check the output of the ./configure script; it may report missing
dependencies, which often lead to deactivation of some features of the
program.

Once the ./configure script has run, you will be able to build the
software with the make command. This step may take several minutes,
depending on the complexity of the software. To speed things up, use the
-j n option of make to run the build process in parallel on n CPUs. For
instance:

$ make -j 4

The next step is to install the software in the destination directory.

$ make install

Now, you can optionally remove the source directory and the archive file
to save disk space as the binaries will be copied to ~/.local/bin and
the libraries will go to ~/.local/lib.

If you are following instructions on a web page and the instructions are to sudomakeinstall, resist the urge to do so. Simply run makeinstall.

It is advisable to put those lines in your .bash_profile file at your cluster
home directory, which is sourced at each SSH login, to avoid typing them when
starting each session.

To make sure your paths are correctly set, you can use the which command
to see specifically which binary is called when you issue a given command,
and ldd to see which dynamic libraries the binary is using.

By default, compilers will tune the binary for the CPU of the machine they run on.
So if you compile on, say Lemaitre2, which is equiped with Intel’s Westmere processors,
and then run on the newer SandyBridge processors of NIC4, the binaries of your program
will not use the advanced features the SandyBridge processors offer. The two most
common reasons for performances losses are reduced or inactive vectorisation
(for instance not using the AVX SIMD instructions allow performing 8 double-precision
floating point operations per clock cycle), and inefficient code scheduling
(scheduling compute and I/O operations based on cache sizes and latencies
that are not the correct ones).

What’s even worse, if you compile on a SandyBridge processor and then run on
a Westmere processor, it might crash because it would be trying to use the
AVX units which are absent on the latter processors.

With GCC, you can mitigate this problem using the -march and -mtune arguments.

With the -march argument, you can prevent software crashes by telling the compiler
to use only features that are present in earlier CPUs. If you specify -march=core2
then the resulting binary is guaranteed to work on every computer in the CÉCI clusters.
If you specify -march=westmere, it will work everywhere except on Hmem.
With -march=sandybridge, it will work only on Hercules, Dragon1, Vega and NIC4.
With -march=bdver1, it will only work on Vega.

With the -mtune argument, you can ask the compiler to optimize the binaries
for a specific architecture, while remaining in the limits imposed by the
-march option. It will mainly work on optimizing the instruction scheduling
with respect to the CPU architecture. The -mtune argument accepts the
same values as -march.

A safe option is thus to set -march to the CPU architecture of the oldest
cluster you plan on using, and -mtune to the CPU architecture of the
cluster you plan on using the most. The CPU architecture of each cluster can
be found on the cluster page.

In your code, if you have functions that use CPU intrinsics or optimization
features that are CPU specific, you can use a feature of GCC named
Function multi-versioning.
The idea is to write several versions of the same function, with a specific
__attribute__ to tell GCC which version goes for which CPU. Then, at runtime,
the correct version of the function will be called based on the CPU on
which it is running.

The Intel compiler has a very interesting feature called ‘Multiple code paths’.
With the -x option, you can specify which CPU features the compiler is
allowed to use, for instance -xSSE2 for CPUs that have the SSE2 feature,
namely every current Intel or AMD CPU (beware that the -xSSE2 might lead
to binaries that crash on AMD processors even if they have the SSE feature).
If you specify -xSSE4.2, it will work everywhere except on Hmem. With
-xAVX it will work only on Hercules, Dragon1, Vega and NIC4.

With the -ax option, you can specify additional so-called ‘code paths’
that the compiler will add to the binary, using additional sets of features.
The ‘base path’ is specified with the -x parameter, and the -ax
parameter allows compiling the code several times, each time for a different
CPU architecture, and packing it all in a single binary. The Intel Compiler
runtime will then decide, when the program runs, which portion of the binary
to use based on the CPU of the machine it is running. So you could specify
-axSSE4.2,AVX to build a binary optimized for all the CÉCI computers.

More information for these options are available on the
Intel compiler website,
where the list of valid options is given. The option names are CPU features,
rather than CPU architecture names as for GCC. The features a CPU offers can
be found by looking at the contents of the /proc/cpuinfo file.