ATI Stream vs. NVIDIA CUDA - GPGPU computing battle royale

Test System Configuration and Testing Methodology

Test system configuration

To thoroughly test how these transcoding applications use ATI Stream and CUDA technologies in a consumer-level environment, we were careful to create a mid-range system that consumers could purchase at a relatively affordable cost. We made one change to our test bench that we used in our initial CUDA article a couple weeks ago. For this article, we used a sub $100 AM3 board from Gigabyte called the MA770T-UD3P. This $80 board was so amazing that we will continue to use it in future budget test bench systems because of its overall performance, overclocking capabilities, and build quality. Every other component has remained the same from our previous CUDA review.

Test system configuration with eVGA 9800+

Test system configuration with Radeon 4770

With this in mind, we put together a moderate AMD AM3-based system with 4GBs of RAM and an NVIDIA 9800GTX+ and ATI 4770 graphics cards. Our eVGA GPU is factory overclocked with a 756MHz core clock and 2246MHz memory clock while the ATI 4770 has a 750MHz core clock and 3200MHz (DDR5) memory clock. The 9800+ utilizes 124 stream processors compared to the 4770's 640 stream processors. The 4770 is a lot newer tech-wise than the 9800+, but they should still have comparable speeds and results as well as comparable prices.

Here’s a complete run-down of the prices and specifications of our AM3 test system (Note: All prices were compiled from Newegg.com on July 12) :

The testing perimeters for evaluating the performance of these ATI Stream and CUDA-enabled transcoding applications were as follows:

Evaluate CPU usage and determine how much of the computing load being handled by the CPU with ATI Stream/CUDA enabled and disabled

What performance differences will consumers notice between using ATI Stream or CUDA?

Subjectively evaluate the image quality of outputted video that was transcoded with ATI Stream and CUDA

After we determined our test perimeters, we also wanted a variety of video formats and sizes to choose from for our benchmarks. We choose everything from MPEG-4 and WMV to MOV and H.264 formats. This gives us a broad range of video formats that should appeal to a variety of consumers.

This is a very interesting article to contribute to my PC Hardware class, as I'm currently in a Network Admin program in Vermont. Please keep up the good work guys I love your site, and you have been very helpful over the last several semesters.

Firstly, AMD designs GPUs with many simple ALUs/shaders (VLIW design) that run at a relatively low frequency clock (typically 1120-3200 ALUs at 625-900 MHz), whereas Nvidia's microarchitecture consists of fewer more complex ALUs and tries to compensate with a higher shader clock (typically 448-1024 ALUs at 1150-1544 MHz). Because of this VLIW vs. non-VLIW difference, Nvidia uses up more square millimeters of die space per ALU, hence can pack fewer of them per chip, and they hit the frequency wall sooner than AMD which prevents them from increasing the clock high enough to match or surpass AMD's performance. This translates to a raw ALU performance advantage for AMD:

This approximate 2x-3x performance difference exists across the entire range of AMD and Nvidia GPUs. It is very visible in all ALU-bound GPGPU workloads such as Bitcoin, password bruteforcers, etc.

Secondly, another difference favoring Bitcoin mining on AMD GPUs instead of Nvidia's is that the mining algorithm is based on SHA-256, which makes heavy use of the 32-bit integer right rotate operation. This operation can be implemented as a single hardware instruction on AMD GPUs (BIT_ALIGN_INT), but requires three separate hardware instructions to be emulated on Nvidia GPUs (2 shifts + 1 add). This alone gives AMD another 1.7x performance advantage (~1900 instructions instead of ~3250 to execute the SHA-256 compression function).

Fucking plagerism. Copy/paste from some other source, no citation or credit. Your education should be shredded and flushed down the toilet. Here is where you copied it from for people who want to read from someone with actual knowledge and not just ctrl+c ---> ctrl+v.