Nvidia's GeForce 8800 graphics processor

The green team reinvents its own reality—and rattles oursby Scott Wasson  1:00 PM on November 8, 2006

DURING THE LAST couple of years, whenever we've asked anyone from Nvidia about the future of graphics processors and the prospects for a unified architecture that merges vertex and pixel shaders into a single pool of floating-point processors, they've smiled and said something quietly to the effect of: Well, yes, that is one approach, and we think it's a good future direction for GPUs. But it isn't strictly necessary to have unified shader hardware in order to comply with DirectX 10, and it may not be the best approach for the next generation of graphics processors.

Nvidia's response to queries like this one was remarkably consistent, and most of us assumed that ATI would be first to market with a unified shader architecture for PC graphics. Heck, ATI had already completed a unified design for the Xbox 360, so its next GPU would be a second-generation effort. Surely ATI would take the technology lead in the first round of DirectX 10-capable graphics chips.

Except for this: Nvidia seems to have fooled almost everybody. Turns out all of that talk about unified architectures not being necessary was just subterfuge. They've been working on a unified graphics architecture for four years now, and today, they're unveiling the thing to the worldand selling it to consumers starting immediately. The graphics processor formerly known as G80 has been christened as the GeForce 8800, and it's anything but a conventional GPU design. Read on for more detail than you probably need to know about it.

G80: the parallel stream processor Since we're all geeks here, we'll start at the beginning with a look at a large-scale block diagram of the G80 design. Nvidia has reworked or simply thrown out and replaced vast portions of its past GPU designs, so not much of what you see below will be familiar.

Block diagram of the GeForce 8800. Source: NVIDIA.

This is just a Google Earth-style flyover of the thing, but we'll take a closer look at some of its component parts as we go. The key thing to realize here is that you don't see many elements of the traditional graphics rendering pipeline etched into silicon. Instead, those little, green blocks marked "SP" and arranged in groups of 16 are what Nvidia calls stream processors. The G80 has eight groups of 16 SPs, for a total of 128 stream processors. These aren't vertex or pixel shaders, but generalized floating-point processors capable of operating on vertices, pixels, or any manner of data. Most GPUs operate on pixel data in vector fashion, issuing instructions to operate concurrently on the multiple color components of a pixel (such as red, green, blue and alpha), but the G80's stream processors are scalareach SP handles one component. SPs can also be retasked to handle vertex data (or other things) dynamically, according to demand. Also unlike a traditional graphics chip, whose clock frequency might be just north of 600MHz or so, these SPs are clocked at a relatively speedy 1.35GHz, giving the GeForce 8800 a tremendous amount of raw floating-point processing power. Most of the rest of the chip is clocked independently at a more conventional 575MHz.

Below the eight "clusters" of stream processors is a crossbar-style switch (the bit with all of the lines and arrows) that connects them to six ROP partitions. Each ROP partition has its own L2 cache and an interface to graphics memory (or frame buffer, hence the "FB" label) that's 64-bits wide. In total, that gives the G80 a 384-bit path to memoryhalf again as wide as the 256-bit interface on past high-end graphics chips like the G71 or ATI's R580. Contrary to what you might think, this 384-bit memory interface doesn't operate in some sort of elliptical fashion, grabbing data alternately in 256-bit and 128-bit chunks. It's just a collection of six 64-bit data paths, with no weirdness needed.

Also in the G80, though not pictured above, is a video display engine that Nvidia describes as "new from the ground up." The display path now features 10 bits per color channel of precision throughout, much like what ATI claims for its Avivo display engine in the Radeon X1000 series.

That's the 10,000-foot overview of the new GPU. As I said, we'll dive deeper into its various components, but let's first stop to appreciate the scale and scope of this thing. You may need to be at 10,000 feet elevation to see the entire surface area of the G80 at once. Nvidia estimates the G80 to be a mind-boggling 681 million transistors. That's over twice the number of transistors on the G71, roughly 278 million. ATI tends to count these things a little differently, but they peg their R580 GPU at 384 million transistors. So the G80 is a next-gen design in terms of transistor count as well as features, an obvious tribute to Moore's Law.

The thing is, the G80 isn't manufactured on a next-generation chip fabrication process. After some bad past experiences (read: GeForce FX), Nvidia prefers not to tackle a new GPU design and a new fab process at the same time. There's too much risk involved. So they have instead asked TSMC to manufacture the G80 on its familiar 90nm process, with the result being the single largest chip I believe I've ever seen. Here's a look at the GeForce 8800 GTX, stripped of its cooler, below an ATI Radeon X1900 XTX.

It's under a metal cap (fancy marketing term: "heat spreader") in the pictures above, but we can surmise that the G80 has the approximate surface area of Rosie O'Donnell. Nvidia's isn't handing out exact die size measurements, but they claim to get about 80 chips gross per wafer. Notice that's a gross number. Any chip of this size has got to be incredibly expensive to manufacture, because the possibility of defects over such a large die area will be exponentially higher than with a GPU like the G71 or R580. That's going to make for some very expensive chips. This is what I call Nvidia "giving back to the community" after watching the success of $500 graphics cards line their pockets in recent years. No doubt the scale of this design was predicated on the possibility of production on a 65nm fab process, and I would expect Nvidia to move there as soon as possible.

The GeForce 8800's discrete display chip

That's not all for GeForce 8800 silicon, either. You may have noticed the separate chip mounted on the GeForce 8800 GTX board in the pictures above, between the GPU and the display outputs. This is an external display chip that has the TDMS logic for driving LCD displays and the RAMDACs for analog monitors. This puppy can drive an HDTV-out connector and two dual-link DVI outputs with HDCP support.

Nvidia says it chose to use a separate display chip in order to simplify board routing and manufacturing. That makes some sense, I suppose, given the G80's already ample die area, but the presence of an external display chip raises some intriguing possibilities. For instance, we might see a multi-GPU graphics card with dual G80s (or derivatives) with only a single display chip. Nvidia could also offer G80-based solutions for non-graphics applications without including any display output whatsoever.

Another detail you may have spied in the pictures above is the presence of two "golden fingers" connectors for SLI multi-GPU configurations. As with ATI's new internal CrossFire connectors, the two links per board will allow staggered connections between more than two graphics cards, raising the possibility of three- and four-way SLI configurations in motherboards with enough PCIe graphics slots.