Parallel

CUDA, Supercomputing for the Masses: Part 16

By Rob Farber, March 25, 2010

CUDA 3.0 provides expanded capabilities

In CUDA, Supercomputing for the Masses: Part 15 of this article series, I discussed mixing CUDA and OpenGL within the same application by utilizing a PBO (Pixel Buffer Object) to create images with CUDA on a pixel-by-pixel basis and display them using OpenGL. In this article and the next, I discuss the CUDA 3.0 release. Far from an update to just support the newest 20-series architecture (codename "Fermi"), CUDA 3.0 is a major revision number increment release that adds enhancements valuable to all CUDA developers to make day-to-day development tasks easier, less error prone, and more consistent.

Overview

Expanded consistent coverage appears to have been the thinking behind this major revision number release as it fills in several previous gaps and must-have capabilities. In a nutshell, there appear to be four main focus areas for the CUDA 3.0 release:

Consistency and interoperability between the CUDA runtime and driver codes, OpenGL and DirectX, as well as versioning and significant additions to CUDA-GDB including a memory checker to find misaligned and out-of-bounds errors.

C++ Class and Template inheritance.

Increased OpenCL capability, which will not be covered in this article.

Early Fermi architecture support, which will be covered in-depth in a separate article that will quickly follow this two-part series on the CUDA 3.0 release.

For many developers, installing CUDA 3.0 is a "must get" because this release supports CUDA driver and runtime buffer interoperability. In addition, CUDA-GDB can now debug driver API codes!

While this series has been focused on teaching the CUDA with the runtime API (e.g., those methods that start with "cuda" as opposed to "cu") because it is fairly intuitive and not too verbose, many developers utilize the driver API. As the CUDA Programmers Guide Chapter 3 notes:

[T]he CUDA driver API requires more code, is harder to program and debug, but offers a better level of control and is language-independent since it handles binary or assembly code.

As a result, many -- especially commercial developers -- will be overjoyed because both debugging and interoperability enhancements permit better use of existing code bases and new development.

Similarly, those developers tasked with delivering production codes, or codes that need to run on a variety of systems without recompilation, will be very happy that the CUDA Toolkit libraries supports versioning. CUDA has evolved rapidly and has adding valuable new features with each release. Now an application can verify the driver and libraries on a system will provide the correct feature set. Similarly, multiple versions can be explicitly used just in case an older version -- perhaps to utilize a deprecated feature -- is required for a particular routine or to utilize a legacy library.

The CUDA 3.0 release will be a very popular download for C++ programmers because inheritance is now supported for both classes and templates! For non-C++ programmers, inheritance is a fundamental concept in the C++ language. Without a doubt, the CUDA 3.0 release will enable more C++ classes and libraries that can efficiently use both GPU and CPU resources. This in turn can open the doors to portable software and powerful hybrid computing models as well as facilitating porting efforts for existing C++ commercial applications to GPUs and other profit-based CUDA software development. C++ and CUDA support is a topic for an future article.

A new separate version of the CUDA C Runtime (CUDART) for emulation-mode debugging made it into this release. While worth mentioning, please note that emulation mode debugging has been deprecated as of the CUDA Toolkit 3.0 release and will be removed in a future release (forum link). Those who wish an emulated environment should look to a new project called gpuocelot lets CUDA programs run on x86 architecture CPUs without recompilation. More information can also be found in the paper, Translating GPU Binaries to Tiered SIMD Architectures with Ocelot. Ocelot also includes some debugging capabilities.

Be aware that you need to have the latest NVIDIA driver installed to use the CUDA 3.0 toolkit. As always, the latest released driver can be downloaded from CUDA Zone and installed for a number of systems. Beta drivers and software can be downloaded from nvdevelopers but registration is required. Ubuntu users might wish to follow the tutorial at Web Upd8 to see how to install the latest released or beta drivers via the Ubuntu tools.

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task.
However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

This month's Dr. Dobb's Journal

This month,
Dr. Dobb's Journal is devoted to mobile programming. We introduce you to Apple's new Swift programming language, discuss the perils of being the third-most-popular mobile platform, revisit SQLite on Android
, and much more!