Beauty And The Beast: Exploiting GPUs In Haskell.

Alex Cole, Alistair A. McEwan, and Geoffrey Mainland

Appeared in: Communicating Process Architectures 2012, pp. 121-134

Abstract. In this paper we compare a Haskell system that exploits a GPU back end using Obsidian against a number of other GPU/parallel processing systems. Our examples demonstrate two major results. Firstly they show that the Haskell system allows the applications programmer to exploit GPUs in a manner that eases the development of parallel code by abstracting from the hardware. Secondly we show that the performance results from generating the GPU code from Haskell are acceptably comparable to expert hand written GPU code in most cases; and permit very significant performance benefits over single and multi-threaded implementations whilst maintaining ease of development. Where our results differ from expert hand written GPU (CUDA) code we consider the reasons for this and discuss possible developments that may mitigate these differences. We conclude with a discussion of a domain specific example that benefits directly and significantly from these results.

2011

A Comparison Of Data-Parallel Programming Systems With Accelerator.

Alex Cole, Alistair A. McEwan, and Satnam Singh

UK Electronics Forum (UKEF) 2011, pp. 13-24

An Analysis of Programmer Productivity versus Performance for High Level Data Parallel Programming.

Alex Cole, Alistair A. McEwan, and Satnam Singh

Appeared in: Communicating Process Architectures 2011, pp. 111-130

Abstract. Data parallel programming provides an accessible model for exploiting the power of parallel computing elements without resorting to the explicit use of low level programming techniques based on locks, threads and monitors. The emergence of Graphics Processing Units (GPUs) with hundreds or thousands of processing cores has made data parallel computing available to a wider class of programmers. GPUs can be used not only for accelerating the processing of computer graphics but also for general purpose data-parallel programming. Low level data-parallel programming languages based on the Compute Unified Device Architecture (CUDA) provide an approach for developing programs for GPUs but these languages require explicit creation and coordination of threads and careful data layout and movement. This has created a demand for higher level programming languages and libraries which raise the abstraction level of data-parallel programming and increase programmer productivity. The Accelerator system was developed by Microsoft for writing data parallel code in a high level manner which can execute on GPUs, multicore processors using SSE3 vector instructions and FPGA chips. This paper compares the performance and development effort of the high level Accelerator system against lower level systems which are more difficult to use but may yield better results. Specifically, we compare against the NVIDIA CUDA compiler and sequential C++ code considering both the level of abstraction in the implementation code and the execution models. We compare the performance of these systems using several case studies. For some classes of problems, Accelerator has a performance comparable to CUDA, but for others its performance is significantly reduced; however in all cases it provides a model which is easier to use and enables greater programmer productivity.