BFG GeForce 8800 GTS Review - PAGE 2

Though we received a 8800 GTS for today's review, we will be looking at the 8800 GTX's architecture as our primary example. We will talk more about the differences between the GPU's later on the article.

We'll start with the most notable aspect of the new architecture - the unified shader pipeline. While news of a unified design from NVIDIA came as a suprise when it hit the rumor channels a number of weeks ago, this decision was made based on the entire ethos behind DirectX 10. What's important to note about DirectX 10, is that aside from the new geometry shader step which we will likely see implemented in future DX10 based games, Microsoft has not added anything drastically different from what DirectX 9 already offers. The central theme of DirectX 10 is optimization and this extends from reduced CPU usage to the new push for unified architectures.

Instead of being divided up into seperate vertex and pixel shaders as in the past, NVIDIA has unified the entire shader pipeline. The result is what they are calling their Gigathread technology. This centers around a completely different approach to GPU building and this NVIDIA supplied graph does a good job of contrasting the two concepts.

The Old Way: Vertex and Pixel Shaders

The diagram below shows the classic GPU architecture which we have all grown quite accustomed to. This design does not maximize efficiency as at any given moment, not every one of the vertex shaders may be being utilised while all of the pixel shaders may be under maximum load or vise versa. This effectively leaves unused pipes that sit idle waiting for the other units to catch up before receiving more instructions.

The New Way: Unified Architecture!

NVIDIA's approach to unified architecture as detailed in the diagram below, was to get rid of the vertex and pixel shader pipelines as we know them, and replace those with completely decoupled "stream processors", as they are being dubbed. In the case of the 8800 GTX, the core clock speed is 575 MHz (500 MHz GTS). In a standard GPU this would implicate that the vertex and pixel shading units also run at this speed. In 8800 series architecture however, these units (now bundled together as stream processors) run at a completely seperate clock speed which in the 8800 GTX's instance, is 1350 MHz (1200 MHz GTS).

If the diagram above does not make any sense, keep reading! The concept of completely decoupled pipelines within the GPU is an odd thing to grasp but is facilitated by a central dispatch processor (or arbiter in ATI/Microsoft speak) that keeps the stream processors consistently utilised. The dispatch processor essentially sends data it receives through the stream processors which loop that data multiple times until all the necessary operations are complete before outputting to the Raster Operations Pipeline (ROPs) and then to memory.

The decoupling motif extends even further to the decoupling of the shader pipelines (stream processors) from the texture units. In the past, shader pipes would often be limited by the texture units fetching and filtering and thus a bottleneck would arise. Because these have been seperated on 8800 series cards, the stream processors can be performing other calculations while the texture units (which work at only 575 MHz) work over longer operations. The figure below shows an illustration of what this might look like in some instances.

All these design decisions come together to create the following diagram. You can see all 128 stream processors (96 in the case of the GTS) in their arrangements here.

You can see here the path that the render data takes as it enters the GPU and is processed through the new shader structure. The vertice is effectively run through multiple wash cycles as it moves through the dispatcher, through a stream processor, back through dispatch (depending on the nature of the data) etc before being output to the ROP.

Some have wondered about the effectiveness of these seemingly general purpose stream processors in relation to their 'dedicated' vertex and pixel shader predecessors. If comparing pure shader vs shader performance, the stream processors in 8800 series cards should theoretically be able to do either operation just as fast as a dedicated unit. The real potential performance hold up however would be in the scheduling overhead that gets introduced by having to dispatch multiple threads to different sub processors. Fortunately, any inefficencies in NVIDIA's Thread Processor design will be negated by the fact that the 8800 has 128 pipelines that are available to perform either operation at any time, loosing a major performance bottleneck. One final note about the unified design is that the performance benefits it brings extend to current DirectX 9 games as well as future DirectX 10 games which should mean tangible performance deltas while we wait for DirectX 10 titles to hit with Vista next year.

The final point worth mentioning here is NVIDIA's new marketing speak for their physics processing. This will be implemented into some DirectX 10 games and will allow physics processing to be done directly off of the GPU through the stream processors. This is obviously to encourage the purchase of two boards in SLI (or three as may be the case with the GTX) to maximise graphics and physics performance.

Memory Arrangement

When details initially emerged on the G80 last month, there was much suprised discussion over the memory configuration and the higher bus width. NVIDIA has spent virtually no time discussing this seemingly major enhancement (seeing as this is the first external memory bus on a GPU over 256 bits) however, and really, there isn't a whole lot to discuss. We would presume that the sub system in this case functions similarly to the memory bus on 7 series cards and has simply been expanded to accomodate more memory. They have however mentioned future support for GDDR4, though GDDR3 is the memory used on current 8800 cards. As seen on the previous page, the 8800 GTX has a 384 bit wide memory bus and 768 MB of GDDR3 memory while the GTS features a 320 bit memory bus and 640 MB of total memory. The memory clock speeds on the GTX are 900 MHz while the GTS loses some and operates at 800.

Keep reading for a look at the card and an overview of the new image quality enhancements made!

Well the drop to 65nm was a last minute decision because the heat from the R600 @ 80nm was a joke so they shrunk it around beginning this year to try save much time possible and actually release a solid high end card that will easily match whatever Nvidia throws at it with its revised 8800. Plus not to mention the market would have been flooded with new skus which AMD would hate to waste money on removing them just to update in such a short time like the X1800 cards getting replaced by the X1900. Time will truely finish off the rest of this story but AMD should have a monster of a gpu on their hands.

I mean, I'm all for ATi. I like my last Nvidias that I have owned, but I am still ATi at heart. It's just that this is a long wait for a card that was supposed to match the power of the 8800GTX. Is it really going to match it? Or is it going to be something superior? It just seems odd to wait this long to bring it out if it will only be equal with the 8800GTX. But, it is the start of new and better things to come with 65nm technology, at least.

Yeah well dont just expect only the R600, look for the rest of the line to be launching probably around the same time. Amd is ramping it new gpus @ 65nm right now so the wait is definitely worth it this time around plus theres a great possibility of a full launch 2 weeks after its release date or announcement day they claim that it launches. Im not fan-boi'n out, Im just waiting for the technical good stuff thats just around the corner. This might sound crazy but my R600/65nm dream came true because my thoughts werent too off with 80nm being ridiculous for the size of the beast.

on altavista all you find is the thread.. then you can look at the posts in the thread.. and it clearly says november 2006 on the post before your first in this thread.

And secondly, the mods will yell at you if you start bumping up threads 2 years old...

There are rules against that. So you DO NOT HAVE THE RIGHT to bump up any thread you want.

In fact, i think you had to agree to compling with the rules when you joined... so you really shoujld have had a look...You should seriously read the rules before making a comment like that (3 years...pff)

well in all fairness...i searched altavista and this was one of very few i could find on this card..on altavista you cannot define how old threads are so ner ner..and i have a right to comment if its 2 years old, 3 months, 1 day or whatever.