Parallel

CUDA, Supercomputing for the Masses: Part 17

By Rob Farber, April 14, 2010

CUDA 3.0 provides expanded capabilities and makes development easier

In CUDA, Supercomputing for the Masses: Part 16 of this article series, I discussed the CUDA 3.0 release. CUDA 3.0 is a major revision number increment release that adds enhancements valuable to all CUDA developers to make day-to-day development tasks easier, less error prone, and more consistent.

As mentioned in the previous article, expanded consistent coverage appears to have been the thrust behind this major revision number release as it fills in several previous gaps and must-have capabilities. In a nutshell, in this article I discuss runtime and driver API compatibility, the new graphics interoperability API, C++ inheritance plus expanded functionality in CUBLAS and CUFFT. Examples are provided that demonstrate:

Consistency and interoperability between the CUDA runtime and driver codes through two simple examples that call a runtime kernel and a CUBLAS routine.

A C++ example that:

Uses both versions of the OpenGL interoperability APIs in drawing a Mandelbrot set. The C++ source code concisely recreates the simplePBO example from Part 15 of this article series using C++ classes.

Demonstrates C++ inheritance by deriving a new class from our C++ Mandelbrot example that uses programmable shaders created with Cg. (Please look to the extensive NVIDIA Cg homepage for more information). While Cg compatibility is not new with the CUDA 3.0 release, mixing Cg shaders with CUDA can open up a vast collection of Cg libraries and existing software!

Be aware that the latest NVIDIA driver must be installed to use the CUDA 3.0 toolkit. As always, the latest released driver can be downloaded from CUDA ZONE and installed for a number of systems. Beta drivers and software can be downloaded from nvdevelopers but registration is required. Ubuntu users might wish to follow one of the many available guides, such as the one at Web Upd8, to see how to install the latest released or beta drivers via the Ubuntu tools.

Mixing CUDA Runtime and Driver API Codes

Previous articles in this series have focused on teaching the CUDA with the runtime API (e.g. those methods that start with "cuda" as opposed to "cu") because it is fairly intuitive and not too verbose. Many developers prefer to utilize the driver API because they have more control and can make better use of existing code bases. Now programmers can utilized the best characteristics of both APIs.

The following is the source code for a driver mode CUDA program that calls a kernel via the runtime API. Please put this into a file called vectorAddDrv.cu:

A detailed discussion of the driver API is beyond the scope of this article, as I'm focusing on interoperability and the 3.0 release. Even so, much of this code should look familiar to runtime API developers as only GPU setup, memory allocation and data movement is utilized. Many of these calls are similar to the runtime API calls so it should be easy to follow the source code. The specific point made with this example is that the following runtime CUDA call to kernel() works in the 3.0 release:

This example demonstrates that CUDA-GDB in the 3.0 release works with driver API programs. It also shows how straightforward it now is to mix driver and runtime API codes.

Similarly, the following example, blasAddDrv.cu, demonstrates that it is now possible with CUDA 3.0 to call the CUBLAS library routines. In this case, the previous example code for vectorAddDrv.cu was adapted to call the saxpy() routine:

cublasSaxpy(N, 1.0f,(float*) d_A, 1,(float*) d_B, 1);

The following is the complete source code for vectorAddDrv.cu. Again, I won't discuss the details of the driver API. See the NVIDIA documentation for more information.

Use nvcc to build the executable. The following simple script builds the executable on Linux. Please note that this program requires the CUBLAS library. Linking is specified with the -lcublas command-line option.

Running the program demonstrates that it does indeed work correctly as indicated by the "Test PASSED" message:

$ ./blasAddDrv
Vector Addition (BLAS and Driver APIs)
Test PASSED

Unquestionably, the ability to mix and debug driver and runtime API codes and libraries is valuable. For many, this expanded capability alone will make the CUDA 3.0 release an obvious download choice. As the CUDA library and code base expands, this transparent interoperability will continue to pay dividends in ease of use -- although it is likely that in the future most developers will utilize this capability without thought or awareness of the ease in which it occurs.

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task.
However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Video

This month's Dr. Dobb's Journal

This month,
Dr. Dobb's Journal is devoted to mobile programming. We introduce you to Apple's new Swift programming language, discuss the perils of being the third-most-popular mobile platform, revisit SQLite on Android
, and much more!