Review: The introduction of 512MiB consumer graphics hardware

Introduction

ATI are launching their first consumer 512MB graphics board today, which I've been evaluating for the past few days. I'll cover it properly in due course, with today's exposure in this article limited to pictures of it and an analysis of a popular performance metric, but for now I'm going to leave the full preview untouched, for a couple of reasons. Firstly, NVIDIA have a matching SKU that I'm going to wait for so I can get the numbers done in one article, rather than two (saves you reading and me some writing time). Secondly I'd like to take today's launch as an opportunity to discuss the basic ideas behind a 512MiB graphics board to let you know why you'd want one, why you won't, and what the rationale behind them appearing at retail now, rather than any time in the past, is.

So I don't want to take anything away from ATI's significant launch today, but at the same time I'm going to reserve full page time for the new SKU until a later date, when I have everything in front of the me that I need, for a complete comparison. With that all said, there's a lot more to discuss about a doubling in on-card memory than you'd think, so let's dive right in, starting at the beginning.

Why no 512MiB consumer graphics products until now?

We've had NVIDIA announce a 512MiB part in recent times, with a 6800 Ultra variant which has been branched off from the Quadro FX 4400 hardware they've had in the professional space for a wee while. With ATI's announcement today, which has no immediate pro-level pairing, and NVIDIA looking like they'll bring a half-GiB 6800 GT to the table soon, too, that's the beginnings of a new round of choices in the graphics market, starting in May 2005.

But why no hardware with as much memory before now, for consumer hardware? It's not for any technical reason, since I know of engineering hardware at four of the major graphics IHVs with at least that amount of memory, and the memory controller on nearly all of the previous generation of GPUs is able to support a half-GiB memory size. Indeed, going back to the idea that NVIDIA's 6800 Ultra is little more than a Quadro FX 4400 with a new BIOS and a couple of minor hardware changes, there's been a need for very large framebuffers on professional-level hardware for quite some time.

The basic reasoning for that is memory pressure. Until PCI Express came along, you couldn't write data back out to off-card memory in anything other than entire framebuffer sized chunks, consuming CPU resources at the same time. That has downsides for the basic class of professional-level 3D applications, given that you want to always have spare CPU time for geometry processing and preparation work that the GPU can't do for you. The CPU's time is better spent elsewhere than processing massive frames of data in fairly unmeaningful ways.

So while PCI Express affords you much finer granularity in terms of what you can write back to an off-card memory space, since it lets you write all manner of data types into system memory in singular blocks, with AGP you can't do that. Given something like medical imaging, where you have a requirement to work with a couple of million vertices of geometry per frame, each defined as x,y,z,c0, where x, y and z defines the position of each vertex in 3D space, and c0 is the vertex colour. Given that the imaging application likely wants to work in 32-bit spacial precision and 32-bit colour, that's 4 bytes (32 bits) each for x, y and z. c0, the vertex colour, is a 4-component 32-bit vector (ABGR or some other colour format), so another 4 bytes for that, too.

At 16 bytes per vertex for a couple of million vertices, that's around 30MiB just for the vertex data, per frame. Then given that it's a medical imaging application you're running, you might want to lookup into a cross section of the geometry using a 3D volume texture and do some sampling using a pair of smaller cubemaps. You also might want to shade the geometry too, using fragment programs which sample a decent amount of texture data. Anything larger than a 256x256x256 power of two volume texture is going to overflow 256MiB, with your 2 million vertices worth of vertex buffers, leaving you strugging to fit texture data and the cubemaps into memory at the same time. If you're then antialiasing everything at high quality, performance is buggered.

The driver's swapping data on and off of card memory all the time, producing overhead, taxing the bus interconnect the card sits on and with AGP, having the CPU disseminate the downloaded framebuffer data at regular intervals, all while sending frames of geometry to the card.

While that's all understood on the professional level, where a couple of million vertices per frame is conservative in many cases, regardless of peripheral data you also want on card memory, that kind of memory pressure doesn't really exist in the consumer space.

Running any modern triple-A game title at 1600x1200 with a high amount of samples for anti-aliasing (at least four Z samples say, which is ~30MiB of AA sample data to store, per frame, if you don't mask off what you don't need to sample) with a high degree of anisotropic texture filtering is doable at very interactive framerates with the current class of high-end hardware. You're more shader limited than anything these days, with memory pressure at even that resolution with those settings not enough to tax a 256MiB board.

The only way that's going to happen, unless people start playing games at larger resolutions with the same settings, is if the quality, size and number of in-game art assets increases fairly significantly. And is that going to happen? You can skip the next page if you don't have your technical hat on.