NVIDIA pioneering OpenCL support on top of CUDA

NVIDIA, Apple's new MacBook chipset partner, is working hard to provide seamless support for OpenCL, the cross platform API Apple developed for Snow Leopard to create a vendor neutral, open specification for parallel programming across any compliant GPU.

Apple has spun its OpenCL API off to the Khronos Group, which maintains it as an open, royalty free standard that any GPU maker can implement. The first operating system to support OpenCL will be MacOS X 10.6 Snow Leopard, which debuts next year. Khronos also maintains the OpenGL graphics programing API, and the OpenAL API for audio. OpenCL was given a similar name to associate the technologies together.

Why OpenCL?

Clock rates of general purpose CPUs are no longer rapidly increasing; Intel and other CPU makers are now using multiple cores to speed up their processors. In contrast, video card GPUs are gaining tremendous new processing power in addition to gaining multiple core support.

OpenCL is designed to allow developers to spin processor intensive tasks to the often idle GPU to take advantage of all that latent processing power. Additionally, OpenCL works with Snow Leopard's Grand Central scheduling technology to support multiple core architectures to do as much as possible in parallel across both the CPU and any available GPUs.

An OpenCL head start for NVIDIA, via CUDA

NVIDIA has a leading advantage in supporting OpenCL because of the work the company has already completed with its CUDA driver interface, to implement parallel programing and "GPGPU," or general purpose computing performed on graphics processing units.

Manju Hegde, the General Manager of CUDA at NVIDIA, explained to AppleInsider that supporting OpenCL on a GPU requires certain hardware capabilities such as scatter write, as well as certain generality of control flow. Both have already been implemented in NVIDIA's CUDA architecture.

NVIDIA's CUDA ISA and hardware compute engine "were designed to support multiple entry points into the compute power of the GPU including standard computing languages (such as C, Fortran, etc) as well as API style interfaces like OpenCL," Hegde wrote in an email interview.

Rather than being competing technologies, Hegde noted that "OpenCL is a layer on top of the CUDA driver interface. As such, OpenCL is one avenue to GPU computing through CUDA, C for CUDA is another."

OpenCL vs CUDA?

When asked how NVIDIA's CUDA compares with OpenCL, and if NVIDIA is planning to support both in its future products, Hegde explained, "This is probably better put by saying how does C for CUDA compare with OpenCL this is a language to language comparison."

Hegde added, "The answer is that the two share very similar constructs for defining data parallelism, which is generally the major task, so the code will be very similar and the porting efforts will be minor.

"As OpenCL is another method of accessing the GPU, we wholeheartedly support it. Its sits seamlessly on top of our CUDA architecture and as such, developers using NVIDA hardware have a choice of language and programming environment.

"With regards to product support, we plan to have OpenCL supported on the CUDA architecture which means that any NVIDIA GPU built upon the CUDA architecture will support OpenCL. This means every GPU (including GeForce, Tesla and Quadro lines) from the GeForce 8 series onwards will support OpenCL. This gives OpenCL developers an installed base of more than 100 million GPUs."

Will OpenCL be Compatible?

We asked NVIDIA if it thinks OpenCL (which has also been adopted as the latest GPGPU strategy by AMD) is complete enough to give users a seamless experience when running OpenCL software across different GPU architectures (in the manner of PostScript across various vendors' laser printers), or if it anticipates problems and incompatibilities between vendors' implementations (in the manner of Java across various devices' implementations).

Hegde answered, "OpenCL is a multi-vendor standard and so the expectation is that if a vendor has an OpenCL compliant implementation, code written in OpenCL should run seamlessly across their architectures.

"NVIDIA has followed a very consistent and unfaltering strategy with CUDA. The C for CUDA programming model is being taught in more than 50 schools around the world. We have in excess of 25,000 developers actively working on CUDA today. If you look at www.nvidia.com/cuda, youll see hundreds of codes and applications that are using our CUDA architecture today. Moreover, CUDA was designed to natively support all parallel computing interfaces and will seamlessly run OpenCL and future standards as they arise."

The future of OpenCL

We also asked NVIDIA for some immediate examples of applications that can take advantage of OpenCL now, and what future potential it sees in the specification.

Hegde answered, "While the OpenCL spec is announced today, there are conformance tests that need to be developed and then final implementations will be released around Q2 next year. So we are a little way away from having apps that can take advantage of OpenCL today. Of course C for CUDA is available today on the Mac OS, so developers wanting to start developing for the GPU can get started now and as we said before, both C for CUDA and OpenCL share very similar constructs for defining data parallelism, so if they wish, porting that code to OpenCL after its full release, will be easy.

"In terms of potential, its huge! The enormous parallel processing power of the GPU has been delivering speed up for 20-200X in many codes, from oil and gas exploration and medical imaging to video transcoding. As more and more developers begin to port their apps to the GPU, you will see a new wave of applications hit the market."

OpenCL cross platform and other GPGPU standards

When asked if any direct OS support is required to implement OpenCL support on other platforms, and if NVIDIA sees momentum clearly building behind OpenCL as the standard for GPGPU computing, Hegde noted that, "yes, the first OS to support OpenCL will be Snow Leopard."

Hegde also said, "In the world of parallel computing, there are a range of standards emerging, C for CUDA from NVIDIA, OpenCL from Khronos, DX11 Compute from Microsoft, and so on. Developers like to have choice, they pick whatever different programming style suits their needs/deployment. And developers will use the interface most comfortable to them, ie: one that supports libraries and OS that they are accustomed to.

"NVIDIA will continue to invest in both its CUDA architecture and its C for CUDA programming environment, while also offering robust support for new standards as they emerge.

"To summarize, we think OpenCL is great and we support any initiative that unleashes the massive power of the GPU. We have worked extremely closely with Apple on the OpenCL spec, OpenCL was developed on NVIDIA GPUs and we were the first to show working OpenCL code so we are confident that our implementation of OpenCL will be second to none. The addition of OpenCL to our industry leading toolkit for GPU Computing means a fantastic array of choices for developers."

Nice to get comments right from nVidia. I guess this means that nVidia will continue to be supporting C for CUDA alongside OpenCL.

I wonder if you can get a statement from nVidia whether they will move PhysX to OpenCL? Right now PhysX is built on CUDA and requires nVidia GPUs for hardware acceleration. If they moved to OpenCL, then AMD GPUs would support it as well. Although perhaps they prefer to keep PhysX to themselves as a product differentiator.

And it's important to note that nVidia isn't the only one moving aggressively in supporting OpenCL. AMD already has OpenCL up and running in their labs and plans to incorporate it into their next Steam SDK for developers in H1 2009.

It seems everyone seems to think Apple supports nVidia because nVidia is wholeheartedly behind OpenCL. While I don't doubt nVidia supports OpenCL, it seems to me that AMD has more invested in OpenCL than nVIdia does.

AMD has actually announced they are abandoning their proprietary CTM GPGPU implementation and are fully moving to OpenCL. In comparison, nVidia will continue to develop their own CUDA implementation alongside OpenCL.

It'd also be interesting if you could ask AMD whether older GPUs like the X1600 and X1900 will be supported in OpenCL?

DX10 GPUs and unified shaders aren't necessarily required for GPGPU operation. nVidia requires it since they are layering OpenCL on top of CUDA which only supports Geforce 8xxx and up. Afterall, the first really popular use of GPGPU by consumers were the DX9.0c X1600, X1800, and X1900 which could do video transcoding using the original AVIVO Video Convertor and were also the original GPU Folding@home clients.

Similarly, ATI's Close to Metal platform uses Brook+, but the original BrookGPU framework can actually generate output to most OpenGL and DX9 GPUs including the Radeon 9700 and even the GeforceFX 5200. It all depends on how the standard is defined. It's probably in the Khronos Group's best interest to try to include as many GPUs as possible for a larger installed base. Given that 2 generations of MacBook Pros use the X1600, the Mac Pros used X1900, and Xserves have the X1300, it'd be in Apple's and ATI's own interest to have developed OpenCL in such a way to support these GPUs.

If I understood correctly, around mid 2009 we can expect to have snow leopard and applications that take advantage of OpenCL + GPU power. I am looking forward to this. It will be extremely cool to have a macbook pro that can blast any overclocked desktop windows PC junk out of the water.

I also expect Microsoft's DHS 11 (DirectHorseShit 11, as one AI member aptly put it) to fail miserably. Why? Because everything Microsoft has touched during the last couple of years have turned to dog poo. If so, we will still have C for CUDA for a nice competitor vs. OpenCL.

If I understood correctly, around mid 2009 we can expect to have snow leopard and applications that take advantage of OpenCL + GPU power. I am looking forward to this. It will be extremely cool to have a macbook pro that can blast any overclocked desktop windows PC junk out of the water.

I also expect Microsoft's DHS 11 (DirectHorseShit 11, as one AI member aptly put it) to fail miserably. Why? Because everything Microsoft has touched during the last couple of years have turned to dog poo. If so, we will still have C for CUDA for a nice competitor vs. OpenCL.

I don't think it'll be that easy. Like it or not DirectX is here to stay. And with GPGPU functionality built into DX11, when game developers write a DirectX game, which seems to be most of them, and make a DX11 code-path for graphics, they might as well just use DX11 for GPGPU since it's right there, likely has similar syntax, and likely has less compatibility/synchronization worries.

OpenCL really aught to develop a strong userbase to take advantage of it's early release. Hopefully drivers and development tools for all platforms come out as soon as possible.

So my question is simply - what does this mean to the end user? We understood that Quartz (Extreme) already utilized the video card for the UI. Are we now going to have Quartz Hyper Extreme or whatever? What is Apple planning to unleash on us that requires so much power - pro tools, prosumer tools, new stuff? This is an especially interesting development since Snow Leopard is supposedly going to be an optimized OS compared to the previous ones.

So my question is simply - what does this mean to the end user? We understood that Quartz (Extreme) already utilized the video card for the UI. Are we now going to have Quartz Hyper Extreme or whatever? What is Apple planning to unleash on us that requires so much power - pro tools, prosumer tools, new stuff? This is an especially interesting development since Snow Leopard is supposedly going to be an optimized OS compared to the previous ones.

Much improved performance for processing-intensive apps at first, then as time goes by, even simple apps could benefit from being spun off to the otherwise idle GPU. That's it in a nutshell (from what I understand).

I think we need to see GPGPU branch out from the Medical Imaging, Oil and Gas and other verticals that don't help consumers.

I think consumers want to see Video encodes that don't take forever, 1080p playback that doesn't stutter. 3D renders that don't take a quarter of a lifetime to complete. Imaging editing that doesn't bog down once the picture grows large.

I'm not expecting huge advances here for a couple of years. I think it's going to take 10.7 before we see some some major developers using OpenCL.

p.s OpenAL is not managed by the Khronos group it's managed by Creative. I think it would do better to be managed by the Khronos Group however.

He's a mod so he has a few extra vBulletin privileges. That doesn't mean he should stop posting or should start acting like Digital Jesus.- SolipsismX