Application programming for modern heterogeneous systems which comprise multi-core CPUs and multiple GPUs is complex and error-prone. Approaches like OpenCL and CUDA are relatively low-level as they require explicit handling of parallelism and memory, and they do not offer support for multiple GPUs within a stand-alone computer, nor for distributed systems that integrate several computers. In particular, distributed systems require application developers to use a mix of programming models, e.g., MPI together with OpenCL or CUDA.

We propose a uniform, high-level approach for programming both stand-alone and distributed systems with many cores and multiple GPUs. The approach consists of two parts: 1) the dOpenCL runtime system for transparent execution of OpenCL programs on several stand-alone computers connected by a network, and 2) the SkelCL library for high-level application programming on heterogeneous stand-alone systems with multi-core CPUs and multiple GPUs. While dOpenCL provides transparent accessibility of arbitrary computing devices (multi-core CPUs and GPUs) across distributed systems, SkelCL offers a set of pre-implemented patterns (skeletons) of parallel computation and communication which greatly simplify programming these devices. Both parts are built on top of OpenCL which ensures their high portability across different kinds of processors and GPUs.

We describe dOpenCL and SkelCL, demonstrate how our approach simplifies programming for distributed systems with many cores and multiple GPUs and report experimental results on a real-world application from the field of medical imaging.