Over 60 trainings all over Europe for universities and industryOn-site trainings on the whole range of GPU computing technologiesEach lecture accompanied with a practical session on remote GPU clusterBest recipes of GPU code optimization, based on our 5-year development experienceWe have multiple training programs and even books! Check out our catalogue here.

OpenACC enables rapid transition of serial C/C++/Fortran into GPU-enabled parallel code. However, due to high-level nature, OpenACC does not offer access to GPU-specific features useful for debugging, optimization and other purposes. In this article we demonstrate how to call CUDA device functions from within OpenACC kernels by two examples: GPU compute grid retrieval and printf.

The performance power of GPUs could be exposed to applications using two principal kinds of programming interfaces: with manual parallel programming (CUDA or OpenCL), or with directive-based extensions relying on compiler's capabilities of semi-automatic parallelization (OpenACC and OpenMP4). Unlike for GPUs, Intel has never offered an explicit CUDA-like interface for their Xeon Phi accelerators to general public, leaving OpenMP offloading directives as the only programming option.

Based on liboffloadmic, we have prototyped "micrt" - a programming interface to execute memory transfers and kernels, similarly to CUDA runtime. Find the code example and building instructions here.

GPUs are particularly powerful in compute-intensive image processing and may serve in many environments requiring realtime operation, such as video surveillance, traffic analysis or in-vehicle safety systems. Many image progessing algorithms could be easily implemented using existing building blocks, such as OpenCV. However, additional performance and flexibility requires deeper knowledge of GPU internals and programming techniques.

Applied Parallel Computing LLC offers a specialized 3-day course on Image Processing with CUDA. The first day is dedicated to the basics of GPU architecture and CUDA programming. The second and the third days of training are dedicated to intensive guided CUDA practice in implementing different types of image filters. Where appliciable, CUDA implementations are compared to OpenCV.