[TEST] GPU Computing – GeForce and Radeon OpenCL Test (Part 1)

As I promised, here are the results of the OpenCL benchmarks I did with the GPU Caps Viewer 1.8.2 OpenCL demos.

Each OpenCL demo is available in two flavors: CPU and GPU. Currently the CPU version runs only on AMD platforms since AMD is the only manufacturer to provide a CPU implementation of OpenCL. In the other side, the GPU implementation of OpenCL is supported by NVIDIA and AMD.

I focuse this article on the OpenCL GPU code path.

But before diving on benchmark results, let’s have a quick look at OpenCL and the related terminology.

Overview of OpenCL

OpenCL (Open Computing Language) is an open standard for general purpose parallel programming across CPUs, GPUs and other processors like the NPU (Network Processing Unit).

OpenCL is an API and a language. The OpenCL API is used to manage OpenCL entities (compute devices, contexts, kernels, …) whereas the OpenCL language is the programming language (based on C) used to write kernels.

OpenCL offers the same functionalities than NVIDIA CUDA or Microsoft DirectCompute. All these technologies allows to use the power computation of modern GPUs, which are massively parallel, for solving general purpose problems. One nice feature of OpenCL is that OpenCL does not rely on any other APIs or technologies. You can use OpenCL alone in a simple command line application (see this news the explanation of Neil Trevett) to perform some heavy computations…

An OpenCL platform (or host) is made of one or several compute devices. A compute device is a GPU for example. Currently all GPU Caps OpenCL demos use only one OpenCL compute device. So if you have a GeForce GTX 295, only one GPU will be used. I will update later GPU Caps demos to use several compute devices to perform OpenCL computations.

Each compute device has several compute units. A compute unit is a stream multiprocessor in a GeForce GPU or a SIMD engine in a Radeon. And each compute unit has several processing elements.

A processing element is a scalar unit (or ALU or streaming processor (NV)). An OpenCL kernel is a program (coded with the OpenCL language) that is executed by all compute units in the same time.

A compute unit of a Radeon HD 5000 series has 80 processing elements (16 processing cores with 5 ALUs per processing core) and a compute unit of a GeForce has 8 processing elements.

Programming an OpenCL application is like programming an OpenGL app, you need OpenCL headers files (few files), OpenCL.lib and OpenCL.dll. OpenCL.dll is provided by NVIDIA or AMD with their drivers. Where can you find OpenCL samples? NVIDIA, AMD and Apple have their own SDK with several samples.

To retrieve the OpenCL API support of your platform (like number of compute units or OpenCL extensions), just use the OpenCL tab of GPU Caps Viewer:

HERE you can find OpenCL details for a Radeon HD 5870 with Catalyst 9.12.