Introduction

Some of the theory on parallelisation is covered by TDT4200: Parallel Computing, relevant parts are the distributed memory and shared memory parallisation as well as the parallelisation theory.

In addition to topics covered by TDT4200, TMA4280 covers some mathematical theory behind solutions to the problems typically solved by supercomputers and efficient ways of finding said solutions.

Shared and distributed memory parallelisation

It is not feasible to share all the memory on large clusters, containing anything from just below 10000 cores to 3 million cores on world's fastest supercomputer. A hybrid model is often utilised, with several cores sharing memory.

Distributed memory parallelisation: MPI

Message Passing Interface. Four main modes of communication: one-to-one, one-to-all, all-to-one and all-to-all. Should probably have something on groups and communicators, which are ordered sets of processes, potentially with virtual topologies such as being placed in a cartesian grid.

Distributed memory forces you to think about where your data is, which is good. It also excludes race conditions, because each node has its own part of memory to work with.

Distributed file I/O

MPI I/O, should probably write something about this.

One-to-one

One process communicates with one other process. The MPI calls used would be MPI_Send and MPI_Recv.

One-to-all

Send a message from one process to all processes in the communicator. Use MPI_Bcast. Useful when sending configuration parameters. If you need to send a different message to each process, use MPI_Scatter. This will send different parts of an array to different processes. Useful when all processes need to sum a part of an array, for example.

All-to-one

This category can be used to make all processes calculate a partial result, and store all parts on one process. MPI_Gather is used to collect the results on one process.

All-to-all

MPI_Allgather.

Communicators

Communicators in MPI can send data between processes in a group. We usually only use MPI_COMM_WORLD as communicator, but more complicated models with multiple communicators are possible. All processes in a group have a unique rank.

Shared memory parallelisation: OpenMP

Parallelisation through threading (on the samen node). All threads have access to the same memory, which makes race conditions possible. OpenMP has several keywords to avoid race conditions, like atomic and barrier.

The maths used in this course

TMA4280 is a maths course, but most of the curriculum is centered around parallel computing. Nevertheless, some actual knowledge of maths is required.

The Poisson problem

The Poisson equation is an elliptic partial differential equation. The Poisson problem is the solution of the Poisson equation given boundary conditions. The poisson equation is typically denoted as $$ -\nabla^2u = f,\quad \text{in}\ \Omega .$$
Here, $ u $ is the unknown, $ f $ is the load, and $ \Omega $ is the domain. $ \nabla^2 $ is the sum of the second order partial derivatives.

Speedup

To determine the speedup from 1 to $P$ processors, we use the times $T_1$ and $T_p$.
$$ S_p = \frac{T_1}{T_p} $$$S_p = P$ is the best theoretical possible speedup. More processors require more communicator, and achieve less speedup per processor.