Process Grid and scoped operations

The processes of a parallel machine with P processes are
often presented to the user as a linear array of process IDs,
labeled 0 through (P - 1). For reasons described below,
it is often more convenient to map this 1-D array of P
processes into a logical two dimensional process mesh, or grid. This
grid will have R process rows and C process columns, where
R * C = G <= P. A process can now be referenced
by its coordinates within the grid (indicated by the notation
{i, j}, where 0 <= i < R, and 0 <= j < C),
rather than a single number. An example of such a mapping is
shown in below.

An operation which involves more than just a sender and a receiver
is called a scoped operation. All processes that participate
in a scoped operation are said to be within the operation's scope.

On a system using a linear array of processes, the only natural scope is
all processes. Using a 2-D grid, we have 3 natural scopes, as shown in
the following table.

SCOPE MEANING
------ ----------------------------------------------
Row All processes in a process row participate.
Column All processes in a process column participate.
All All processes in the process grid participate.

These groupings of processes are of particular interest to the linear algebra
programmer, since distributed data decompositions of a 2D array (a linear
algebra matrix) tend to follow this process mapping. For instance,
all of a distributed matrix row can be found on a process row, etc.

Viewing the rows/columns of the process grid as essentially autonomous
subsystems provides the programmer with additional levels of parallelism.
Of course, how independent these rows and columns actually are will depend
upon the underlying machine. For instance, if the grid's processors are connected
via ethernet, we can see that the only gain will be in ease of programming.
Speed is unlikely to increase, since if one processor is communicating, no others
can. If this is the case, process rows or columns will not be able to
perform different distributed
tasks at the same time. Fortunately, most modern supercomputer interconnection
networks are at least as rich as a 2D grid, so that these additional levels of
parallelism can be exploited.