AMD Updates GPU Co-Processor, Developer Tools

In the 1980s, PC owners frequently purchased a math co-processor for their 286- and 386-based computers to speed up calculations and number crunching.

Fast-forward 20 years, and AMD has a supercomputing math co-processor card that can give half a teraflop of performance in a $2,000 form factor.

The company plans to introduce an update to its Stream processor initiative at the Supercomputing 07 show in Reno, Nev. next week. The Stream project is a derivative of the ATI video graphics processing unit (GPU) modified to act as, essentially, a math co-processor.

GPUs altered to operate this way are often called General Purpose GPUs, or GP-GPU. ATI released the first Stream card last year based on an older generation video processor and rather basic compiler tools. This year, both the hardware and tools have been updated.

The AMD FireStream 9170 is based on AMD's RV670 video processor and looks like a video card, but without a monitor attachment. It has 320 stream cores, the equivalent of the vertex shaders used to draw videogames on the graphics cards. Only these cores are high-power math co-processors, all interconnected, and connective to 2GB of 128-bit DDR3 memory.

At peak performance for single precision calculations, the FireStream 9170 can deliver up to 500 gigaflops of performance. The 9170 processor has half the power draw of the RV670 graphics cards, requiring less than 150 watts of power.

However, it's the addition of double precision calculations that semiconductor analyst Nathan Brookwood of Insight64 likes the most. "One of the things that most impressed me about the new board is that this, to the best of my knowledge, is the first of GP-GPU products that supports double precision calculations in hardware, so that's a major step forward. If you are doing stuff that requires a lot of accuracy, you want double-precision," he told InternetNews.com.

In processors, floating point numbers can be stored as 32-bit single precision or 64-bit double precision values. If you store a single precision number, you can get about six digits of accuracy in terms of your calculation, but if you store a number in double precision form, you then get 15 or 16 digits of accuracy. So any task calculating large numbers needs double precision.

The other big part of the announcement is that AMD is significantly improving the development tools and libraries for building applications that utilize the GP-GPU. The original toolkit released last year, called Close To The Metal (CTM), required programming in assembly, something no programmer likes.

The tools were something Brookwood said was lacking, and Patricia Harrell, director of Stream computing at AMD, acknowledged as much.

"Customers wanted AMD to build a basic toolset for them to get access to this technology," she said. "CTM was a good start. It gave people access to the hardware. The downside was it was close to the metal and you had to have a good understanding of GPU architecture to use it."

What developers needed was something on a higher level, like C. So AMD will introduce Brook+, a high-level, C-like language for the GPU with libraries like Fast Fourier Transform (FFT) and Matrix Multiple, both used heavily in graphics processing. Brook+ will be based on the Brook open source language developed by Stanford University plus some AMD enhancements, and will be released as open source.

AMD will also introduce CAL, the Compute Abstraction Layer, which will be used for performance tuning and for maintaining compatibility with future cards. So even as AMD introduces new GP-GPUs, applications written today won't break.

Harrell said AMD will release benchmark numbers at the supercomputing show next week, but did say that initial benchmarks showed improvements ranging from 10 times to 50 times better than a standard CPU, depending on the algorithm.