Package:

Hardware accelerators such as Graphics Processing Units (GPUs), Intel Xeon Phi co-processors (PHIs), and Field-Programmable Gate Arrays (FPGAs) are now ubiquitous in extreme-scale high performance computing (HPC), cloud, and Big data platforms to facilitate execution of workloads that demand high energy efficiency. They present unique interfaces and programming models therefore posing several limitations, which must be addressed to facilitate execution of large workloads. There is no library providing a unifying interface that allows programmers to write reusable out-of-core implementations of their data-parallel kernels that can run efficiently on different mainstream accelerators such as GPUs, PHIs, and FPGAs. We address this shortage in this paper. We present a library called libhclooc, which provides a unifying interface facilitating out-of-core implementations for data parallel kernels on the three different mainstream accelerators (GPUs, Intel Xeon Phis, FPGAs). We implement out-of-core matrix-matrix multiplication (MMOOC) using the libhclooc API and demonstrate its superior performance over vendor implementations. We show that it suffers from a maximum overhead of 10%, 4%, and 8% (due to abstraction) compared to the state-of-the-art optimised implementations for Nvidia K40c GPU, Nvidia P100 PCIe GPU, and Intel Xeon Phi 3120P respectively. We also show that using libhclooc API reduces the number of lines of code (LOC) by 75% thereby drastically improving programmer productivity.

Email address protected by JavaScript. Activate javascript to see the email.

We use cookies to improve our service for you. You can find more information in our data protection declaration. By continuing to use our site, you accept our use of cookies and Privacy Policy.OkPrivacy policy