AMD revs up Stream SDK

With Nvidia getting most of the attention when it comes to the use of graphics cards and GPU co-processors to boost the number-crunching capability of workstations and servers, it's hard for Advanced Micro Devices to get a word in edgewise. Perhaps that's why AMD waited until the holiday news dead zone to push out the second release of its Stream software development kit for GPUs and CPUs.

Stream SDK v2.0, which you can get your hands on here, went through four beta releases this summer and fall and is compliant with the OpenCL 1.0 parallel-computing spec.

OpenCL, which was originally created by Apple and has had many other companies (including Intel and Nvidia as well as IBM and AMD) kick in technology, is managed by the Khronos Group Consortium, which also manages the OpenGL 3D graphics specification and API set, among many other specs. OpenCL covers not only how work is dispatched across multiple core CPUs but also across hybrid CPU-GPU combinations.

In the AMD world, where FireStream GPUs and Radeon graphics cards can be used to run parallelized computations in conjunction with x64 processors, the Stream SDK is analogous to Nvidia's CUDA programming environment, which is the toolset to exploit GeForce and Quadro graphics cards and Tesla GPU coprocessors that are linked to CPUs over PCI-Express links. Nvidia put its OpenCL 1.0 drivers out at the end of November, and with the Stream v2.0 SDK, AMD has caught up.

The Stream SDK v2.0 supports the OpenCL Installable Client Driver, which allows multiple OpenCL implementations to exist on the same machine and use the CPUs and GPUs in their turn. AMD warns that code written with the previous betas of the SDK will require some changes. (Hey, you don't put beta tools into production; don't blame AMD.) The SDL also supports atomic functions for 32-bit integers and the ATI Stream Profiler performance tool that is embedded in Microsoft's Visual Studio 2008 integrated development environment tool.

The SDK also allows the Catalyst CAL runtime libraries for AMD graphics cards to be loaded without having to load the full Catalyst stack onto CPUs, and the OpenCL implementation cooked up by AMD allows for OpenCL and CAL APIs to be used by a single application. There are a bunch of other tweaks, nips, and tucks, which you can see here in the release notes.

The SDK is also previewing OpenCL-OpenGL interoperability, which will allow calculations made using the OpenCL stack to be rendered and displayed by the OpenGL stack without having to send the data back out across the PCI-Express bus to the CPU where it is just dispatched back to the GPU for rendering. This can boost performance.

The Stream SDK v2.0 also has a preview of Microsoft's DirectX 10 graphics and multimedia APIs running in conjunction with the OpenCL stack so that Windows can take advantage of extra GPU processing oomph without making the round trip to the CPU. (This is akin to the OpenCL and OpenGL integration above.)

The Stream SDK v2.0 software runs on Windows XP SP2 and SP3, Windows Vista SP1, or Windows 7. If Linux is your thing, the SDK is limited to openSUSE 11 and Ubuntu 9.04. (Where are proper commercially supported server operating systems, particularly the HPC variants?) The SDK can work in conjunction with Microsoft's Visual Studio 2008 Professional Edition, Intel's C compiler 11.X, or the open source GNU compiler collection 4.3 or later. The SDK supports FireStream 9250 and 9270 GPU co-processors and a variety of FirePro and Radeon graphics cards. The PC, workstation, or server that the Stream environment is running on has to be an x86 or x64 processors that supports SSE 3.X or higher multimedia instructions.

In a related announcement, AMD said that it has been working with SiSoftware to put together an OpenCL GPGPU benchmark suite, which is part of SiSoftware's performance-tuning utility known as Sandra 2010 - short for System Analyser, Diagnostic and Reporting Assistant. Sandra 2010 supports both AMD's and Nvidia's interpretations of the OpenCL standard, and works on systems using GPUs in conjunction with x86, x64, Itanium, or ARM processors.

Among other things, the benchmark test suite includes generating Mandelbrot sets, and yes, I just wasted 30 minutes looking at them again instead of doing useful work. ®