Contents

Part 1. Theory and architecture

Although the AMD R6xx architecture had been announced in May, only the top R600-based solution appeared on the market at that time. The other graphics cards and GPUs based on the unified R6xx architecture were delayed, we were provided only with theoretical information.

It's so sad to watch ATI gradually give ground to NVIDIA since 2005. The time of NVIDIA's domination came after 3dfx had died down in 2000-2001. Those times ended with the advance of brilliant R3xx chips and RADEON 9x00 graphics cards from ATI. The fall was also caused by some mistakes made by NVIDIA (remember the failure of GeForce FX). Since that time and until 2005 we had been watching the interesting fight between two strong competitors. Common users derived only benefit from it - they regularly got decent products from two companies. But then ATI fell into stagnation. Since that time the company has been always behindhand with its releases. R6xx had the same fate as the R5xx solutions. It inevitably affects sales, market share of the company, and its financial welfare.

And now time has come for AMD's low-end and mid-end solutions with DirectX 10 support to enter the market. Not much time passed since their initial announcement. The main difference from the top R600 GPUs is the 65 nm process technology used to manufacture the RV630/RV610. It reduces prime costs, which is relevant for inexpensive products. This very process technology probably delayed the launch of these solutions on the market. Now the matter depends on sufficient shipments of the new graphics cards to retailers.

Before you read this article, we traditionally suggest that you should read all previous reviews of solutions based on the unified architectures as well as baseline articles DX Current, DX Next and Longhorn that describe various aspects of modern graphics cards and architectural peculiarities of NVIDIA and ATI(AMD) products. These articles predicted the current situation with GPU architectures. A lot of assumptions about future solutions were also correct.

120 scalar floating point ALUs (integer and floating point formats, FP32 precision according to IEEE 754)

2 texture units, support for FP16 and FP32 components in textures

16 texture address units (see the details in the baseline article)

40 TMUs (see the details in the baseline article)

8 bilinear filtering units, which can filter FP16 textures at full speed and support trilinear and anisotropic filtering for all texture formats

Dynamic branching in pixel and vertex shaders

4 ROPs supporting antialiasing with software fetches of more than 16 samples per pixel, including FP16 or FP32 frame buffer formats. Peak performance is up to 4 samples per cycles, Z only mode - 8 samples per cycle

8 multiple render targets (MRT)

Integrated support for two RAMDACs, two Dual Link DVI ports, HDMI, HDTV

40 scalar floating point ALUs (integer and floating point formats, FP32 precision according to IEEE 754)

1 texture unit, support for FP16 and FP32 components in textures

8 texture address units (see the details in the baseline article)

20 TMUs (see the details in the baseline article)

4 bilinear filtering units, which can filter FP16 textures at full speed and support trilinear and anisotropic filtering for all texture formats

Dynamic branching in pixel and vertex shaders

4 ROPs supporting antialiasing with software fetches of more than 16 samples per pixel, including FP16 or FP32 frame buffer formats. Peak performance is up to 4 samples per cycles, Z only mode - 8 samples per cycle

8 multiple render targets (MRT)

Integrated support for two RAMDACs, two Dual Link DVI ports, HDMI, HDTV

RADEON HD 2400 XT Specifications

Core clock rate: 700 MHz

40 unified processors

4 TMUs, 4 ROPs

Effective memory frequency: 1600 MHz (2*800 MHz)

Memory type: DDR2/GDDR3

Memory size: 256 MB

Memory bandwidth: 12.8 GB/s

Maximum theoretical fillrate: 2.8 gigapixel per second.

Theoretical texture sampling rate: 2.8 gigatexel per second.

PCI-Express 16x bus

1 × DVI-I Dual Link, 2560×1600 video output

TV-Out, HDTV-Out, HDCP support, HDMI adapter

Power consumption: about 25 W

Recommended price: $79

RADEON HD 2400 PRO Specifications

Core clock rate: 525 MHz

40 unified processors

4 TMUs, 4 ROPs

Effective memory frequency: 800 MHz (2*400 MHz)

Memory type: DDR2

Memory size: 128/256 MB

Memory bandwidth: 6.4 GB/s

Maximum theoretical fillrate: 2.1 gigapixel per second.

Theoretical texture sampling rate: 2.1 gigatexel per second.

PCI-Express 16x bus

1 × DVI-I Dual Link, 2560×1600 video output

TV-Out, HDTV-Out, HDCP support, HDMI adapter

Power consumption: below 25 W

Recommended price: $59

Some common peculiarities of these solutions: unified superscalar architecture, programmable hardware tessellator, improved support for video decoding - Avivo HD, native CrossFire support. The new GPUs have a unified architecture. All of them offer sterling support for DirectX 10. Moreover, they support some functions that will appear in the next versions of this API.

After the announcement of the top R600 solution, we learned that not all new solutions from AMD were functionally identical in terms of hardware support for video decoding, as it had been announced. Just like NVIDIA cards, low-end and mid-end GPUs from this AMD family possess better video decoding capacities, because R600 either does not have that improved unified video decoder (UVD), or it does not work well... Anyway, analysis of performance and video decoding quality of the new solutions is outside the bounds of this article. But we shall test the new chips and add their results to the latest article about this issue.

So, AMD is the first to enter the market of low-end and mid-end graphics cards manufactured by the 65 nm process technology. As we have already stressed many times, such upgrades are important. Advanced process technologies allow to design smaller cores and to pack more transistors into the same surface area, to increase the frequency potential of GPUs and the yield of effective chips, as well as to reduce prime costs. Another important advantage is reduced power consumption. The new mid-end and low-end GPUs from AMD consume much less power and dissipate less heat than their competitors.

Architecture

In the previous article about the R6xx architecture and RADEON HD 2900 XT, we examined all architectural features of the new family of DirectX 10 GPUs from AMD. This article will contain only a brief description, you can get full information in the article mentioned above.

The R6xx architecture combines several solutions from the previous generations: R5xx and Xenos (a graphics chip from Microsoft Xbox 360). Besides, it adds some innovations: a more powerful dispatch processor, superscalar architecture of shader processors with dedicated branching units, etc. The new architecture scales well both ways. We can see it in low-end and mid-end solutions. The block diagram of RV630 and RV610:

We can see well that RV630 differs from R600 only in the number of various units: ALUs, ROPs, TMUs. In other respects, these GPUs are identical. RV610 has more differences, they are quantitative (fewer ALUs and TMUs) and qualitative: no hierarchical Z-buffer, no L2 Texture Cache, the only level caches both vertex and pixel data. The key quantitative changes: the number of shader processors in RV630 is reduced to 24 (120 processors), RV610 has only 8 of them (40 processors), the number of texturing units is reduced to 8 and 4 correspondingly; and these low-end chips have only four ROPs. It's done because of fewer transistors, of course. And it will have a negative effect on performance.

The other architectural peculiarities of R6xx-based solutions are covered in the baseline article, the link to this article is published above. Low-end GPUs do not support the 512-bit bus, of course. But everything else in this article applies to them completely. Moreover, what we wrote about Avivo HD applies only to them, not to the top solution.

In the next part of this article we'll learn how performance of the new inexpensive RV630/RV610-based solutions correlates with performance of competing graphics cards from NVIDIA. We'll also see how their cut-down features affect their performance relative to the top GPU in the family - R600.