Co-opting GPU for CPU Tasks Advanced by NVidia

Earlier this week, engineers at nVidia put the finishing touches on version 0.8 of its Compute Unified Device Architecture system for Windows and Red Hat Linux. CUDA's objective is to enable C programmers to utilize the high-throughput pipelining architecture of an nVidia graphics processor - pipelines that are typically reserved for high-quality 3D rendering, but which often sit unused by everyday applications - for compute-intensive tasks that may have nothing to do with graphics.

Today, the company announced its first C compiler - part of the CUDA SDK, which will enable scientific application developers for the first time to develop stand-alone libraries that are executed by the graphics processor, through function calls placed in standard applications run on the central processor.

NVidia's objective is to exploit an untapped reservoir on users' desktops and notebooks. While multi-core architecture has driven parallelism in computing into the mainstream, multi-pipeline architecture should theoretically catapult it into the stratosphere. But applications today are naturally written to be executed by the CPU, so any GPU-driven parallelism that's going to happen in programming must be evangelized first.

Which is why the company has chosen now to make its next CUDA push, a few weeks prior to the Game Developers' Conference in San Francisco. The greatest single repository of craftspersons among developers may be in the gaming field, so even though games already occupy the greater part of the GPU's work time, it's here where a concept such as CUDA can attract the most interest.

"The GPU is specialized for compute-intensive, highly parallel computation - exactly what graphics rendering is about," reads nVidia's latest CUDA programming guide (PDF available here), "and therefore is designed such that more transistors are devoted to data processing rather than data caching and flow control."

Huge arithmetic operations may be best suited to GPU execution, nVidia engineers believe, because they don't require the attention of all the CPU's built-in, microcoded functions for flow control and caching. "Because the same program is executed for each data element," reads the CUDA v. 0.8 guide, "there is a lower requirement for sophisticated flow control; and because it is executed on many data elements and has high arithmetic intensity, the memory access latency can be hidden with calculations instead of big data caches."

For CUDA to actually work, however, a computer must be set up with an exclusive NVidia display driver; CUDA is not an intrinsic part of ForceWare, at least not yet. In addition, programs must be explicitly written to support CUDA's libraries and custom driver; it doesn't enable the GPU to serve as a "supercharger" for existing applications. Because the GPU is such a different machine, there's no way for it to take a load off the CPU's shoulder's directly, like an old Intel 8087 or 80186 co-processor used to do.

So an application that supports CUDA thus, by definition, supports nVidia. AMD also has its own plans for co-opting GPU power, which it made immediately clear after its acquisition of ATI.

The CUDA programming guide demonstrates how developers can re-imagine a math-intense problem as being delegated to processing elements in a 2D block, like bestowing assignments upon a regiment of soldiers lined up in formation. Blocks and threads are delegated and proportioned, the way they would normally be if they were being instructed to render and shade multi-polygon objects. Memory on-board the GPU is then allocated using C-library derivatives of common functions, such as cudaMalloc() for allocating blocks of memory with the proper dimensions, and cudaMemcpy() for transferring data into those blocks. It then demonstrates how massive calculations that would require considerable thread allocation on a CPU are handled by the GPU as matrices.

"This complete development environment," read an nVidia statement this morning, "gives developers the tools they need to solve new problems in computation-intensive applications such as product design, data analysis, technical computing, and game physics."