1. OpenCL: This is the Open Compute Language, an open standard for parallel programming on heterogenous systems, currently maintained by the Khronos Group. Its’ advantage over other parallel programming approaches such as CUDA is that its’ vendor-neutral, and as such, will work on any GPU and ASIC that implements the standard. OpenCL code also scales well on x86 Multi-core CPUs (Currently, OpenCL 1.2 needs an x86 CPU with support for SSE3 and SSSE3,x and beyond to run on CPU) are also well supported. In the same voice, vendor neutrality also means that other architectures, such as ARM, are also supported, and as such, do not be surprised to see ARM powered boards crunching OpenCL code with great speed and power efficiency. The futture is going parallel, where heterogenous compute environments will co-exist on the same platform, e.g a standard CPU and (a) (series of) GPU(s) and/or ASICs communicating together to accomplish a parallel compute load in the same device.

2. NVIDIA CUDA: NVIDIA’s Compute Unified Device Architecture (CUDA) is a proprietary parallel compute platform and architecture created by NVIDIA , currently implemented on their GPUs. CUDA allocates direct access to the virtual instruction sets and memory of the parallel compute elements in the CUDA-enabled GPUs. Since CUDA is a proprietary, vendor-locked parallel compute infrastructure, this article will NOT cover CUDA beyond this description, and now, we’ll delve into OpenCL, the open compute standard that is vendor-neutral. CUDA will be covered in an article of its’ own later on.

TODAY’S TOPIC: OpenCL on Arch Linux with AMD GPUs and APUs.

To get up and running with OpenCL on Arch Linux with an AMD GPU/APU and such combinations (as may apply in your case), here are the requirements:

1. An Arch Linux installation.

2. An x86-64 Intel or AMD CPU with support for SSE3 and above.

3. An Evergreen (HD 5000+ series) AMD GPU and above. Legacy GPUs that are no longer supported by AMD Catalyst’s mainline driver will NOT be covered here.

Take note of the MAKEFLAGS = “jn” section. The value of n should be the number of cores on your system. Get this num,ber by running:

cat /proc/cpuinfo | grep processor -wc -I

In my case, it returns 4 because I’m on a Dual Core system with Hyperthreading. Two cores, 4 logical CPUs. Adjust as appropriate.

Also, note that the mtune-generic changes to mtune=native under the ARCHITECTURE AND COMPILE FLAGS. This is so as to generate code that performs optimally on your CPU, and all code generated by makepkg calling up GCC will utilize all the instruction sets your CPU has to offer.

The second edit is under the #BUILD ENVIRONMENT.

Go to the BUILDENV declarative and remove the ! infront of CCACHE, so that the declarative will look like:

BUILDENV=(fakeroot !distcc color ccache check !sign)

Save and close the /etc/makepkg.conf and folloe up by installing ccache.

sudo pacman -S ccache

This will help speed up consequent builds in the future that rely on the same code since ccache, as the name implies, caches built code objects and consequent rebuilds of the same project benefit by having the same work units skipped if no changes are detected in the source files. Thats’ a lot of win right there.

3. Lets’ get to installing catalyst-total ( or catalyst-total-pxp) from AUR. Download the pkgbuild’s tarball and extract it somewhere on your filesystem,. e.g a folder called pkgbuilds under your home directory. Once extracted, it will create a parent folder with the name of the pkgbuild, e.g in the case of amd-catalyst-total, the dir structure (assuming pkgbuild was the extraction directory) is:

cd ~/pkgbuilds/catalyst-total

Listing the directory contents reveals its’ contents, including patches. Note that with AUR, we do NOT package and redistribute binaries of any sort.

To build the package, run:

makepkg -c -s PKGBUILD

The -c option tells makepkg to clean up after itself, and the -s option tells makepikg to automatically satisfy any missing dependencies via Pacman so you won’t have to hunt them down yourself manually. As such, you may be prompted for your password as pacman is launched to install any missing dependencies.

Pro-tip: If a dependency is NOT found by AUR, and cannot be installed, it simply means the package in question resides on the AUR, and as such, you’ll have to build it from source before you install it.

As the package builds, it will output verbose info on the terminal and even offer guides to enabling critical services such as the DKMS Module helper included in the package. Follow the instructions on-screen and you’ll be sorted.

When the process is completed, go on and install the generated pkg.tar.xz package with pacman:

pacman -U *.pkg.tar.xz

Note that we use the -U flag to denote an “update” from the local filesystem. If issues occur (such as being prompted to remove the opensource driver and stuff) , go on and obey. The two cannot co-exist on the system.

After install, reboot the system in emergency/recovery mode (boot up the recovery kernel) and run:

Xorg –configure

WARNING: Do NOT run aticonfig –initial as its’ syntax is broken and the resultant Xorg.conf file will be broken.( To be exact, it breaks the way the PCI device is called/named up). The generated file will look as such, with small variances depending on your monitor setup:

The reason we had to download it manually was because AMD’s website is NOT wget-friendly sincve it requites an auth-token generated after accepting the EULA presented after clicking on the download SDK link.

When done, go into the same folder and extract the amdapp-sdk source archive from AMD. Extract the generated archive ending in lnx64 and browse into the lnx64-generated directory.

cd AMD-APP-SDK-v2.9-RC-lnx64

cd into the include directory:

cd include

Now copy ALL the content here to /usr/include:

sudo cp -avr *.* /usr/include

Once this is done, you’ll have four new directories under /usr/include:

/usr/include/GL

/usr/include/CL

/usr/include/OpenVideo

/usr/include/SDKUtil

Ensure that they exist.

The reason we copy these directories is so that packages that need to build GL and OpenCL code can locate these headers easily without fiddling with CFLAGS to add custom paths, which only complicate things.

Remember, in the Arch way, one of the things we aim for is SIMPLICITY. KISS Simplicity, that simple.

6. Optional:

If you develop OpenCL code AND you’re looking for a great and efficient debugger and profiler, I highly recommend AMDAPP-CodeXL, available on the AUR:

Now that you’re done, and so far, you have a kick-ass Arch Linux box serving up as an OpenCL development workstation, let us enjoy the fruit of our labor and install apps that leverage OpenCL on Linux.