Size and Access Inference for Data-Parallel Programs

Abstract:
Data-parallel programming languages have many desirable features, such
as single-thread semantics and the ability to express fine-grained
parallelism. However, it is challenging to implement such languages on
conventional MIMD multiprocessors, because these machines incur a high
overhead for small grain sizes. This paper presents compile-time
analysis techniques for data-parallel program graphs that reduce these
overheads in two ways: by stepping up the grain size, and by relaxing
the synchronous nature of the computation without altering the program
semantics. The algorithms partition the program graph into "clusters"
of nodes such that all nodes in a cluster have the same loop structure,
and further refine these clusters into "epochs" based on generation and
consumption patterns of data vectors. This converts the fine-grain
parallelism in the original program to medium-grain loop parallelism,
which is better suited to MIMD machines. A compiler has been
implemented based on these ideas. We present performance results for
data-parallel kernels analyzed by the compiler and converted to
single-program multiple-data (SPMD) code running on an Encore Multimax.