This chapter will help you, the algorithm designer, to leave the parallel algorithm design abstract enough to support portability while ensuring that it can eventually be implemented effectively for the parallel systems on which it will be executed.

4.1 INTRODUCTION

The first phase of designing a parallel algorithm consists of analyzing the problem to identify exploitable concurrency, usually
by using the patterns of the Finding Concurrency design space. The output from the Finding Concurrency design space is a decomposition of the problem into design elements:

A task decomposition that identifies tasks that can execute concurrently

A data decomposition that identifies data local to each task

A way of grouping tasks and ordering the groups to satisfy temporal constraints

An analysis of dependencies among tasks

These elements provide the connection from the Finding Concurrency design space to the Algorithm Structure design space. Our goal in the Algorithm Structure design space is to refine the design and move it closer to a program that can execute tasks concurrently by mapping the concurrency
onto multiple UEs running on a parallel computer.

Of the countless ways to define an algorithm structure, most follow one of six basic design patterns. These patterns make
up the Algorithm Structure design space. An overview of this design space and its place in the pattern language is shown in Fig. 4.1.

Figure 4.1 Overview of the Algorithm Structure design space and its place in the pattern language

The key issue at this stage is to decide which pattern or patterns are most appropriate for the problem.

First of all, we need to keep in mind that different aspects of the analysis can pull the design in different directions;
one aspect might suggest one structure while another suggests a different structure. In nearly every case, however, the following
forces should be kept in mind.

Efficiency.
It is crucial that a parallel program run quickly and make good use of the computer resources.

Simplicity.
A simple algorithm resulting in easy-to-understand code is easier to develop, debug, verify, and modify.

Portability.
Ideally, programs should run on the widest range of parallel computers. This will maximize the "market" for a particular program.
More importantly, a program is used for many years, while any particular computer system is used for only a few years. Portable
programs protect a software investment.

Scalability.
Ideally, an algorithm should be effective on a wide range of numbers of processing elements (PEs), from a few up to hundreds
or even thousands.

These forces conflict in several ways, however.

Efficiency conflicts with portability: Making a program efficient almost always requires that the code take into account the
characteristics of the specific system on which it is intended to run, which limits portability. A design that makes use of
the special features of a particular system or programming environment may lead to an efficient program for that particular
environment, but be unusable for a different platform, either because it performs poorly or because it is difficult or even
impossible to implement for the new platform.

Efficiency also can conflict with simplicity: For example, to write efficient programs that use the Task Parallelism pattern, it is sometimes necessary to use complicated scheduling algorithms. These algorithms in many cases, however, make the program very difficult to understand.

Thus, a good algorithm design must strike a balance between (1) abstraction and portability and (2) suitability for a particular
target architecture. The challenge faced by the designer, especially at this early phase of the algorithm design, is to leave
the parallel algorithm design abstract enough to support portability while ensuring that it can eventually be implemented
effectively for the parallel systems on which it will be executed.