Running the numbers
We've already looked at the theoretical peak numbers for the Radeon HD 5830 compared to its closest relatives, but the table below will put it into a bit broader perspective.

Peakpixel fill rate(Gpixels/s)

Peak bilinear
INT8 texelfilteringrate (Gtexels/s)

Peak bilinear
FP16 texelfilteringrate (Gtexels/s)

Peakmemorybandwidth(GB/s)

Peak shaderarithmetic (GFLOPS)

Single-issue

Dual-issue

GeForce 7900 GS

7.7

9.6

-

44.8

-

-

GeForce 7900 GTX

10.4

15.6

-

51.2

-

-

GeForce 8800 GT

11.2

39.2

19.6

64.6

392

588

GeForce 9800 GTX

10.8

43.2

21.6

70.4

432

648

GeForce GTS 250

12.3

49.3

24.6

71.9

484

726

GeForce GTX 260 (216 SPs)

18.2

46.8

23.4

128.8

605

907

GeForce GTX 275

17.7

50.6

25.3

127.0

674

1011

GeForce GTX 285

21.4

53.6

26.8

166.4

744

1116

Radeon HD 3870

13.6

13.6

13.6

73.2

544

-

Radeon HD 4850

11.2

28.0

14.0

63.6

1120

-

Radeon HD 4870

12.0

30.0

15.0

115.2

1200

-

Radeon HD 4890

14.4

36.0

18.0

124.8

1440

-

Radeon HD 5750

11.2

25.2

12.6

73.6

1008

-

Radeon HD 5770

13.6

34.0

17.0

76.8

1360

-

Radeon HD 5830

12.8

44.8

22.4

128.0

1792

-

Radeon HD 5850

23.2

52.2

26.1

128.0

2088

-

Radeon HD 5870

27.2

68.0

34.0

153.6

2720

-

These theoretical capacities don't correspond directly to performance, of course. Much depends on the quirks of the GPU architectures and their implementations. We can measure some of these things with directed tests, though, to give us a sense of how the cards compare. Sadly, we've not been able to include the older, DirectX 9-only graphics cards in these tests, because 3DMark Vantage requires DirectX 10.

I've only included partial information in the table above for the two GeForce 7-series cards, in part because of some limitations of these older architectures. For example, the G71 GPU could filter FP16 texture formats, but it couldn't do so in conjunction with multisampled antialiasing. Counting FLOPS on a non-unified shader design is also a little tricky, so I've abstained. Nonetheless, progress in the past four years has been substantial. The Radeon HD 5830 has 4.6 times the texture filtering capacity and 2.8 times the memory bandwidth of the GeForce 7900 GS. Similarly, the Radeon HD 5870 has 4.4 times the filtering rate and triple the memory bandwidth of the GeForce 7900 GTX.

We've often thought that GPU performance in 3DMark's color fill rate test seems to be limited primarily by memory bandwidth. Notice how much faster the Radeon HD 4870 is than the Radeon HD 5770, for instance. The 5770 has a slightly higher theoretical peak fill rate, but the 4870 has nearly twice the memory bandwidth and proves markedly faster in this directed test.

The 5830, however, breaks that trend by delivering much a lower measured fill rate than the 5850, though their memory bandwidth on paper is identical. Heck, the 4870 outscores the 5830, too, even though it has slightly less theoretical peak fill rate and memory bandwidth. Something about the way AMD pruned back the Cypress GPU's render back-ends produces unexpectedly poor results in this test.

3DMark Vantage was released in April, 2008, and only in the past few weeks has FutureMark fixed the units output by its texture fill rate test. I'm not sure why this obvious bug, about which we exchanged e-mails with FutureMark several times back in '08, took so long to squish. At least we now have our first set of 3DMark texturing results that make intuitive sense.

Those results show us something we've long known: that AMD's recent GPUs score much better than Nvidia's in this benchmark. Being able to put units to them, though, gives us some additional insight. Notice how the Radeons reach very close to their theoretical peaks for INT8 filtering, while the GeForces are just as close to their half-rate FP16 peaks. We've long thought this was a test of FP16 texture filtering rate. What's going on here?

When we asked Nvidia to explain why its GPUs were only reaching about half of their potential, we received an interesting answer. Turns out, Nvidia told us, that this test does indeed use FP16 texture formats, but it doesn't filter the textures, even bilinearly. It's just point sampled, believe it or not. The newer Radeons, it seems, can point-sample FP16 textures at their full rate, even though they can't filter them at that rate. Nvidia's GT200 samples FP16 textures at half of the INT8 rate, hence the disparity. Interestingly, Nvidia says the upcoming GF100 can sample FP16 textures at full speed, so it should perform better in this test, once it arrives. Trouble is, we'd really rather be measuring the texture filtering rates, which matter more for games, than the raw texture sampling rates of these GPUs.

For what it's worth, the Radeon HD 5830 does sample FP16 textures at a much higher rate than the Radeon HD 4870 or 5770. In theory, it should be able to filter them faster, as well.

Performance on these shader power benchmarks tends to vary quite a bit from one GPU architecture to the next. As a result, the 5830 exchanges victories with its closest rival, the GeForce GTX 260, from one test to the next. Meanwhile, the 5770 and 5850 tend to bracket the 5830 exactly as one would expect. The more interesting result may be the fact that the 5830 is between two and four times the speed of the Radeon HD 3870.