About the thesis

Computational devices are rapidly evolving into massively parallel systems. Multicore processors are already standard; high performance processors such as the Cell/BE processor, graphics processing units (GPUs) featuring hundreds of on-chip processors, and reconfigurable devices such as FPGAs are all developed to deliver high computing power. They make parallelism commonplace, not only the privilege of expensive high-end platforms. However, classical parallel programming paradigms cannot readily exploit these highly parallel systems. In addition, each hardware architecture comes along with a new programming model and/or application programming interface (API). This makes the writing of portable, efficient parallel code difficult. As the number of processors per chip is expected to double every other year or so, entering parallel processing into the mass market, software needs to be parallelized and ported in an efficient way to massively parallel, possibly heterogeneous, architectures.

In her thesis, Burrows presents the foundations of a high-level hardware independent parallel programming model based on algebraic software methodologies. The model addresses two main issues of parallel computing: how to map efficiently computations to different parallel hardware architectures at a high and easy to manipulate level, and how to do this at a low development cost, i.e., without rewriting the problem solving code.

The uniqueness of this framework is two-fold:

It presents the user with a programmable interface especially designed to allow the user to express the data dependency information of the computation as real code, in terms of a data dependency algebra (DDA). Hence data dependencies are made explicit in the program code. The DDA interface consists of a generic point type, a generic branch index type, and two sets of generic function declarations on these types, requests and supplies, which are duals of each other.

It gives direct access, within a unified framework, to various hardware architectures’ communication layouts, or their APIs, at a high-level. This allows the embedding of the computation to be fully controlled by the programmer at a high and easy to manipulate level. In turn, this saves the user from the hassle of learning “the dialect” of each targeted hardware architecture in case. Direct access to aspects of the hardware model is needed by some architectures, e.g., GPUs, FPGAs. However, the model is fully portable and not tied to any specific processor or hardware architecture, due to the modularisation of the data dependencies.

The inherent properties of DDAs lead to various execution models, depending on the chosen hardware architecture, and provide full control over the execution models’ computation time. Since spatial placements of computations are controlled from DDAs, this gives full control over space usage as well, whether sequential or parallel execution is desired. In the parallel cases, DDAs give full control over processor and memory allocation, and communication channel usage, while still at the abstraction level of the source program.