Last week NVIDIA released their first set of end-user OpenCL drivers. Previously OpenCL drivers had only been available for developers on the NVIDIA side of things, and this continues to be the case on the AMD side of things. With NVIDIA’s driver release, the launch of AMD’s 5800 series, and some recent developments with OpenCL, this is a good time to recap the current state of OpenCL, and what has changed since our OpenCL introductory article from last year.

A CPU & GPU Framework

Although we commonly talk about OpenCL alongside GPUs, it’s technically a hardware agnostic parallel programming framework. Any device implementing OpenCL should be cable of running any OpenCL kernel, so long as the developers take in to account querying the host device ahead of time as to not spawn too many threads at once. And while GPUs (being the parallel beasts that they are) are the primary focus, OpenCL is also intended for use on CPUs and more exotic processors such as the Cell BE and DSPs.

What this means is that when it comes to discussing the use of OpenCL on computers, we have two things to focus on. Not only is there the use of OpenCL on the GPU, but there’s the use of OpenCL on CPUs. If Khronos has their way, then OpenCL will be a commonly used framework for CPUs both to take better advantage of multi-core CPUs (8 threaded i7 anyone?) and as a fallback mechanism for when OpenCL isn’t available on a GPU.

This also makes things tricky when it comes to who is responsible for what. AMD for example, in making both GPUs and CPUs, is writing drivers for both. They are currently sampling their CPU driver as part of their latest Stream SDK (even if it is a GPU programming SDK), and their entire CPU+GPU driver set has been submitted to the Khronos group for certification.

NVIDIA on the other hand is not a CPU manufacturer (Tegra aside), so they are only responsible for having a GPU OpenCL driver, which is what they have been giving to developers for months. They have submitted it to Khronos and it has been certified, and as we mentioned they have released it to the public as of last week. NVIDIA is not responsible for a CPU driver, and as such they are reliant on AMD and Intel for OpenCL CPU drivers. AMD likes to pick at NVIDIA for this, but ultimately it’s not going to matter once everyone finally gets up to speed.

Intel thus far is the laggard; they do not have an OpenCL implementation in any kind of public testing, for either CPUs or GPUs. For AMD GPU users this won’t be an issue, since AMD’s CPU driver will work on Intel CPUs as well. For NVIDIA GPU users with Intel CPUs, they'll be waiting on Intel for a CPU driver. Do note however that a CPU driver isn't required to use OpenCL on a GPU, and indeed we expect the first significant OpenCL applications to be intended to run solely on GPUs anyhow. So it's not a bad situation for NVIDIA, it's just one that needs to be solved sooner than later.

OpenCL ICD: Coming Soon

Unfortunately matters are made particularly complex by the fact that on Windows and Linux, writing an OpenCL program right now requires linking against a vendor-specific OpenCL driver. The code itself is still cross-platform/cross-device, but in terms of compiling and linking OpenCL has not been fully abstracted. It’s not yet at the point where it’s possible to write and run a single Windows/Linux program that will work with any OpenCL device. It would be the equivalent of requiring an OpenGL game (e.g. Quake) to have a different binary for each GPU vendor’s drivers.

The solution to this problem is that OpenCL needs an Installable Client Driver (ICD), just like OpenGL does. With an ICD developers can link against that, and it will handle the duties of passing things off to vendor-specific drivers. However an ICD isn’t ready yet, and in fact we don’t know when it will be ready. NVIDIA - who chairs the OpenCL working group - tells us that the WG is “driving to get an ICD implementation released as quickly as possible”, but with no timetable attached to that. The effort right now appears to be on getting more OpenCL 1.0 implementations certified (NV is certified, AMD is in progress), with an ICD to follow.

Meanwhile Apple, in the traditional Apple manner, has simply done a runaround on the whole issue. When it comes to drivers they shipped Snow Leopard with their own OpenCL CPU driver, and they have GPU drivers for both AMD and NVIDIA cards. Their OpenCL framework doesn’t have an ICD per-say, but it has features that allow developers to query for devices and use any they like. It effectively accomplishes the same thing, but it’s only of use when writing programs against Apple’s framework. But to Apple’s credit, as of this moment they currently have the only complete OpenCL platform, offering CPU+GPU development and execution with a full degree of abstraction.

What GPUs Will Support OpenCL

One final matter is what GPUs will support OpenCL. While OpenCL is based around the hardware aspects of DirectX10-class hardware, being DX10 compliant isn’t enough. Even among NVIDIA and AMD, there will be some DX10 hardware that won’t support OpenCL.

NVIDIA: Anything that runs CUDA will run OpenCL. In practice, this means anything in the 8-series or later that has 256MB or more of VRAM. NVIDIA has a full list here.

AMD: AMD will only be supporting OpenCL on the 4000 series and later. Presumably there was some feature in the OpenCL 1.0 specification that AMD didn’t implement until the 4000 series, which NVIDIA had since the launch of the 8-series. Given that AMD is giving Brook+ the heave-ho in favor of OpenCL, this will mean that there’s going to continue to be a limited selection of GPGPU applications that work on these cards as compared to the 4000 series and later.

End-User Drivers

Finally to wrap this up, we have the catalyst of this story: drivers. As we previously mentioned, NVIDIA released their OpenCL-enabled 190.89 drivers to the public last week, which we’re happy to see even if the applications themselves aren’t quite ready. This driver release was a special release outside of NVIDIA’s mainline driver releases however, and as such they’re already out of date. NVIDIA released their 191.07 WHQL-certified driver set yesterday, and these drivers don’t include OpenCL support. So while NVIDIA is shipping an OpenCL driver for both developers and end-users, it’s going to be a bit longer until it shows up in a regular release.

AMD meanwhile is still in a developer-only beta, which makes sense given that they’re still waiting on certification. The estimates we’ve heard is that the process takes a month, so with AMD having submitted their drivers early last month, they should be certified soon if everything went well.

AMD actually DOES use stdcall, they just seem not to have used a .def file when they created the library, or something to that extent, which meant they exported the whole mangled function names, not just the clean names themselves, which is common practice with Microsoft and OpenGL.

However, nVidia does NOT use stdcall. This wasn't immediately apparent, since their exported symbols looked nice and clean. However, upon inspecting the actual code inside the DLL, the telltale retn NN callee stack cleanup was missing. So it's not stdcall, it's cdecl.

Who is right in this case? I don't know yet. OpenGL uses stdcall, so that would mean nVidia is wrong here aswell. On the other hand, OpenAL DOES seem to use cdecl, so it's not like there's much of a consistency. AMD says that stdcall was decided by Khronos. In that case, nVidia is wrong, and Khronos is wrong aswell, for not catching this problem during OpenCL 1.0-conformance testing (just like Khronos didn't catch the mangled naming in AMD's CPU drivers, they both passed their tests).

At any rate, I think they all need to go back to the drawing board. Reply

AMD's beta4 SDK has fixed the decorated naming problem. They now have clean naming and stdcall functions, analogous to OpenGL. So I think the AMD SDK in its current form is 'correct'.
nVidia has conceded that they didn't use stdcall, however, they said it wasn't really a mistake because Khronos made the decision to use stdcall at a later time.
They have said that their next release will use stdcall. Sadly they didn't comment on the exported symbol naming problem. So at this point I cannot be sure that their next release will be fully 'correct' and fully compatible with AMD's SDK on a binary level. I think nVidia will do the right thing though.

There was no actual reply from Khronos itself on the matter though. So it looks like this problem was mainly solved between AMD, nVidia and some developers who were using OpenCL and who pointed them into the right direction. Reply

nVidia's Cuda 3.0 beta release also fixes the calling convention/function naming problems. AMD's beta4 SDK and nVidia's Cuda 3.0 beta are now binary compatible.
I've also found that the new nVidia OpenCL release solves quite a few performance issues. OpenCL now runs very well.

On the AMD side, AMD still goofed up in beta4. I tried running the CPU implementation, and it complained about missing atical*.dll files. If you don't have an ATi card in your system, you can't install the Catalyst driver that contains those files. So you have to manually extract the files and place them in the same directory as OpenCL.dll.
But after I had done that, I could run some nVidia samples on the AMD CPU driver, and I could run some AMD samples on the nVidia driver. So the binary compatibility is a fact, and developers can now test their code on two implementations. Reply

The WinXP x64 nVidia 189.91 drivers with OpenCL support did not make an entry for the uninstaller in add/remove programs

It was a bit of a bummer because I installed it in hope of being able to run the tech demo "NVIDIA’s ocean demo" or "DirextX compute ocean" with them (Which Anand used in the 5870 review)
However there drivers did not seem to support it... Reply

OpenCL and DirectCompute are completely independent APIs.
The Ocean demo is DirectCompute, which is part of DirectX 11, and has nothing to do with OpenCL. As such, it's not going to work on XP anyway. You need Vista or Windows 7. Reply

I looked through some of the OpenCL stuff I compiled with the nVidia SDK, and it just links against a generic OpenCL.dll.
Theoretically it should just work fine with any OpenCL.dll, as long as it exports the same functions (which it should, if it's OpenCL 1.0-conformant).
So I don't think there's a binary dependency on a manufacturer there. The only 'problem' is that the binary will just link against whichever OpenCL.dll it finds first. So if you have multiple OpenCL devices installed, you'd probably have to drop your preferred OpenCL.dll into the same directory as the application, to ensure it runs on the proper device, as without an ICD, each DLL will only enumerate the devices from its manufacturer. Reply