More Power to You: nVidia's nForce 4 SLI is Fastest

Review:More 3D rendering horsepower than should be allowed by law, but be prepared to pony up big time. We take a good look at nVidia's nForce 4 SLI.

SLI. Those three little letters conjure up memories of a bygone era. If you wanted the biggest, baddest 3D performer for gaming, you'd have not one, but two 3dfx Voodoo2 cards in your system. And since Voodoo2 didn't do 2D, you'd need a third graphics card to run Windows or DOS 2D. That system would give you about 180Mpixels/sec of pixel-blasting fill-rate, which at the time was about double what any other GPU on the market could hope to deliver. Today we're going to show you a 3D sub-system whose peak fill-rate is roughly 75 times greater than the original Voodoo2 SLI configuration.

The demise of high-flyer 3dfx is one of the saddest stories in the annals of 3D GPU history. But many of 3dfx's engineers, including architect Gary Taroli, wound up working for nVidia after that company acquired 3dfx's IP portfolio several years ago. Recently, nVidia decided to revive the SLI concept, though in name only. At a very basic level, the concept is the same: two is better than one. But how the new SLI (Scalable Link Interface) actually handles the incoming rendering workload is much more sophisticated than the old-school scan-line interleave technology that 3dfx pioneered years ago.

nVidia recently scored a major advance for its chipset business by finally securing a cross-licensing deal with CPU giant Intel. With this deal in place, nVidia chipsets can now address the much larger Intel-based side of the platform market, which comprises about 80% of the overall market. But currently, only nForce4 chipsets supporting AMD's Athlon64 processors will have SLI enabled on consumer desktops. Some Intel Xeon workstation class motherboards also ship with two PCI Express x16 slots.

Today, we're here to show you nVidia's Athlon64-based nForce 4, how its new SLI technology performs, how much more performance it buys you, and whether an SLI-based system makes sense for you.

3dfx's original SLI technology essentially split the 3D scene to be rendered into two fields (like interlaced television), and the two Voodoo/Voodoo2 cards took turns rendering scan-lines. The system worked very well, with very few games having compatibility issues, and it ran like a bat out of hell given the alternatives at the time. But it had its limits like a maximum 1024x768 pixel resolution.

nVidia's revamped SLI sometimes splits the screen in half. Rather than doing it by scan-line however, it actually splits the image into the top and bottom halves, with each GPU handling about half of the rendered scene. The reason for all the "abouts" is that the ForceWare driver actually does load-balancing, handing more work to the less busy GPU, in an effort to keep both GPUs spinning at close to peak efficiency. So while the split will likely be close to 50/50, that won't always be the case. NVidia actually has a feature in their driver that allows you to see how much of the scene each GPU is drawing:

Putting it Together

ASUS' A8N-SLI has added some components to make enabling SLI much easier. In particular, the company's EZ Switch allows the board's two PCIe slots to be quickly re-configured to allow either to run a single card or two cards running together. In SLI mode, each PCIe slot becomes an 8-lane slot, meaning its overall system bandwidth is halved. However with another GPU in the system to handle half of the rendering chores, this constraint shouldn't adversely impact game frame rates for quite some time.

Physically, EZ switch vaguely resembles a memory module. There are two clips that hold the module in place, and releasing the two clips at the same time is a little tricky, but not too onerous. Once released, you can re-align the module to switch modes, and once it clicks down, you're all set.

The ASUS design implements a 12V Molex connector near the two PCIe slots to provide the needed juice when running in SLI mode. Speaking of juice, ASUS supplied us with the following power supply recommendations for running the three different SLI-capable nVidia GPUs in a "typical" high-end system:

Components/Peripherals

Heavy load

Normal load

Light load

AMD K8 939-pin CPU type:

Athlon 64 FX-55

Athlon 64 3800+

Athlon 64 3400+

PCIe x16 graphics cards:

2 6800 Ultras

2 6800 GTs

2 6600 GTs

DDR DIMMs:

4

2

2

HDD:

4

2

2

Optical drive (DVD/CD-RW):

2

2

1

PCIe x1 card:

1

0

0

PCI cards:

3

2

1

IEEE 1394 devices:

1

0

0

USB devices:

6

4

3

Required +12V current:

>25A

>20A

>17A

Required wattage :

>= 500W

>=400W

>=350W

Our test system most closely resembled the "Heavy Load" system, although we had only a single hard-drive and a single optical drive.

During inspection, we left 3DMark05's three game tests running for several hours with the test system's case all closed up. We were looking to see if there might be any overheating or stability issues leaving the system running in marathon mode, but fortunately, it ran like a top, and was happily looping 3DMark05's game tests when we came back to it about four hours later.

This connector attaches the two 3D cards to the SLI system, and the cards use it primarily to stay synchronized. However when we ran early tests using it, we found that image quality was not up to snuff. The primary visual artifact made 3D game images appear to be noisy, like a poorly tuned TV signal. According to nVidia company officials, the ASUS connector is "out of spec," and judging from the artifacts we were seeing, that appeared to the case. The issue has to do with the ASUS SLI connector only making intermittent contact with the I/O pads on each 3D card. When our twin Ultra boards arrived from nVidia, they included a longer, flexible plastic connector that worked fine, and didn't exhibit any of the visual artifacts seen with the ASUS connector. At press time, it was unclear which connector ASUS will ultimately ship with the A8N-SLI, but we hope it will be the nVidia flexible connector.

Other noteworthy features found on ASUS A8N-SLI board include:

1GB/sec interconnect speed for the HyperTransport bus that connects the CPU to the South bridge.

Twin Gigabit Ethernet ports that can be configured to have the system act as a firewall for a network.

FireWire

Four USB ports

Four S-ATA ports internal to the nForce 4 chipset, and another 4 S-ATA ports for RAID volumes via a Silicon Image controller.

Our main testing goal with the ASUS A8N-SLI was to gauge performance scaling of two cards versus one, and to that end, we benched the GeForce 6800 Ultra and GeForce 6800 GT in single-card, and then SLI modes.

We ran a subset of our 3D GameGauge test suite, focusing on the three most fill-rate-bound games: Doom 3, Far Cry, and Half-Life 2 (HL2).

We also tested with Futuremark's 3DMark05, running its main set of three game tests, as well as vertex and pixel shader tests. We rounded out the mix by running the triangle batch test to gauge geometry processing scalability as well. As an inspection test, we also ran SPEC's ViewPerf 8.

Clearly anyone prepared to invest $800-$1200 in 3D cards doesn't want to leave any performance on the table. So to that end, we tested only at a resolution of 1600x1200 with both 4X AA and 8X AF enabled, which is what we affectionately refer to as the "can o' whup'ass" settings.

Our test system had the following components:

Components/Peripherals

Heavy load

CPU:

Athlon 64 FX-55

System memory :

1GB PC3200 DDR-1 (2x512MB)

Audio:

M-Audio Revolution 7.1

Hard-disk:

Western Digital 40GB P-ATA

Optical drive:

Plextor

OS:

Windows XP Pro with SP2 installed (DirectX 9.0c)

In our Results sections we present each set of results twice. We start first with the raw scores, and then in the second graph, we present the normalized score for each card. The raw scores are for all the combinations--single card and dual card. The normalized scores compare percentage performance gain going from a single card to a dual card configuration of the same type. So the normalized score compares only single Ultra to SLI-Ultra or single 6800GT to dual 6800GT. We include these numbers to give you a clear idea how much SLI is scaling performance, so if you see that SLI is 150 and the single-card score is 100, then SLI is delivering 50% more performance.

Interestingly, both the GT and Ultra see about the same relative performance increase (~70%), and the SLI Ultra score obviously blows the doors off of any other 3DMark score we've seen to date.

We ran this test on the 6800 Ultra. The single-card Ultra seems to be leveling off at around 75 million triangles per second, though it appears to have a bit of headroom left. The SLI configuration however shows a nearly 2X performance leap versus the single-card configuration, and also does not appear to be leveling off, indicating even more headroom.

The scores on both the complex vertex test and the pixel shader 2.0 test are remarkable in that they both show nearly linear scaling, which is to say almost double the performance. This would seem to bode very well for shader-intensive games like Half-Life 2, and, in the more GPU-intensive timedemos, we did see considerable scaling.

Next, we would show you a SPECviewperf 8 graph, but what you'd see is identical bars for single-card and SLI test modes. We observed no scaling whatsoever for this workstation benchmark. This seemed odd considering that SLI is ostensibly application-transparent. According to nVidia, the workstation side of SLI has not yet been enabled, and probably won't be until Quadro products are ready for market. nVidia officials also noted that in a number of instances, SPECviewperf 8 may not be GPU-bound, and therefore may not get any SLI benefit. We observed this behavior on several of our Half-Life 2 timedemos where the game is CPU-bound, and using SLI yielded relatively little scaling.

We also found out from nVidia that currently, the ForceWare SLI driver needs an application profile to be able to apply SLI acceleration. These take the form of an XML file that the driver reads when an application is launched. Under this current model, if a new game shipped tomorrow that could benefit from SLI, nVidia would make available on its nZone Web site an XML-based profile that you could download, and when that game's EXE was launched, the driver would read in the new XML profile, and be able to run the game using SLI. nVidia pointed out that this profile-based architecture is a transitional solution, and that these app profiles would go away in an upcoming driver release early next year.

So if you were considering an SLI configuration in hopes of getting yourself a huge pickup in OpenGL workstation application geometry performance, you should hold off until nVidia has a workstation driver with all the needed app certifications ready to roll.

Here's a list of the titles currently found in NVAPPS.XML, the XML file used by the ForceWare driver:

Just out of morbid curiosity, we tried deleting the NVAPPS.XML file that lives in the \Windows\System32 directory to see if it would adversely affect SLI test results. We tried this on both 3DMark05 and Doom 3. Curiously, removing the XML file did not change the results on these two tests. The driver did create a new NVAPPS.XML file, but it was empty, and contained no information about any apps.

Finally, we rebooted the system with the NVAPPS.XML file still deleted to see if the driver was reading in the file's contents on startup, but we still saw the same SLI test reults, and so we're left to wonder what exactly the driver is doing with this XML file.

The discovery of this file was something of a fortunate accident that we came across when we didn't see our ViewPerf results improving. We thought SLI was app-transparent, but according to nVidia, because some SLI-accelerated games shipped before SLI come on the radar, the profiles are needed to "get the party started." A company official also stated that these profiles will be retired in a future version of the ForceWare driver, though declined to give a specific version number or ETA for this new driver.

Our initial finding here was a bit disappointing, but not that surprising.

Doom 3's Ultra High quality settings mode was intended for 3D cards that have a 512MB frame-buffer, since all the game's 3D resources are loaded in to graphics memory completely uncompressed. If you don't have a 512MB frame-buffer (and no one does, yet), what you wind up with is considerable cache-thrashing, since the needed resources have to be constantly paged in from system memory over the PCI-Express interface. Our twin GeForce 6800 Ultras has a combined overall frame-buffer of 512MB, but the ForceWare driver doesn't allocate the frame-buffer to act as a single 512MB block of memory.

Since we were testing a $1,200 3D rendering sub-system, we wanted to pull out all the stops so we first tried the Ultra High quality settings, and we saw virtually no scaling whatsoever. The frame rate went from about 40fps to 42fps. The problem is that both configurations are bottlenecked by the PCI-Express bus, and as a result, whatever additional benefit the second GPU engine was going to give us is masked by the PCIe bottleneck.

When we dialed "down" to the Very High quality settings, we get the results you see in the graphs above. Again, the SLI configurations are exhibiting very good scaling. The 6800 GT actually exhibits better scaling, which probably has more to do with it being severely memory-bound when run in single-card mode. Recall that we're running at a very high pixel resolution with 4X AA and 8X AF enabled.

The results above are an average of the four timedemos we recorded for testing Half-Life 2. However, because some of our timedemos are CPU-bound, the above result is a bit misleading. Invariably a 3D application is bound somewhere, and when you put this big of a rendering sub-system at the back end of a gaming system, it's bound to be CPU-bound in some cases, even with a top-end CPU like the Athlon FX-55.

If we break the scores out, you can see that the Strider timedemo in particular seems to be CPU-bound, whereas the Canal timedemo, which features a detailed water pixel shader effect, clearly exhibits very good scaling on both cards. The Courtyard timedemo is somewhat CPU-bound, and the Raven timedemo exhibits pretty good scaling.

Both cards exhibit very good scaling here, which isn't that surprising. Far Cry throws a mix of geometrically complex scenes and intense pixel shader workloads at rendering hardware. As our previous tests has shown, nVidia's SLI technology is capable of scaling both effectively, assuming the app isn't CPU-bound.

nVidia can claim bragging rights for the fastest consumer 3D rendering system on the planet, though that claim carries a big asterisk next to it. The fine print at the bottom will tell you that this system will cost you roughly $1,500 (the $180 motherboard, $1,200 for twin GeForce 6800 Ultra cards, and $100 for a 500-watt power supply). But if you have a serious power hunger for 3D performance, then this is the fix of fixes. For today's games, this system is joyous overkill, but it will obviously provide miles of headroom for titles coming down the pike over the next several years. To the credit of both nVidia and Asus, the SLI system was very stable, and didn't crash during any of our game tests, nor did it crash during our 12-hour torture test of continuously running 3DMark05 at very high resolutions.

Despite the impressive performance, and very solid scaling when we compare single-card performance versus the SLI configuration, we do have some concerns about the whole rig. For starters, there's the issue of "component matching." Although you might be able to run add-in cards from different makers in SLI mode, your best chance for success will come from buying two boards at the same time made by the same vendor. In its initial briefings, nVidia pitched SLI as having a "buy one now, buy one later" option, where you could buy a single card now, save a few shekels, and go buy another one six months later. We are reluctant however to advise such a strategy, because of issues involving in-line changes that may happen, either to the video BIOS, or possibly small tweaks to the board layout. The jury is still out on how tolerant nVidia's new SLI will be to such variations, and so for now we will err on the side of caution and advise that if you're going to do SLI, buy two boards now.

We also didn't have a chance to characterize the performance of twin GeForce 6600 GT boards, since SLI-capable boards weren't available at press time. That performance story will have to wait for another day, hopefully before year's end.

In addition to the ASUS unit we looked at here today, ABIT, DFI, EPoX, Gigabyte, Iwill, and MSI will also be bringing nForce SLI motherboards to market soon, though how many will be widely available in time for the holidays? We've heard rumblings of ATI developing some kind of competitive response in this escalating arms race, though details at this point remain very sketchy. So for now, the title of "best bezerker 3D rendering system" squarely belongs to nVidia.

We will be doing a full-bore motherboard review of the A8N-SLI in the coming weeks, and should have test results for twin GeForce 6600 GT boards at that time as well, so stay tuned.

Dave came to have his insatiable tech jones by way of music—,and because his parents wouldn't let him run away to join the circus. After a brief and ill-fated career in professional wrestling, Dave now covers audio, HDTV, and 3D graphics technologies at ExtremeTech.

Dave came to ExtremeTech as its first hire from Computer Gaming World, where he was Technical Director and Lead (okay, the only) Saxophonist for five years. While there, he and Loyd Case pioneered the area of testing 3D graphics using PC games. This culminated in 3D GameGauge, a suite of OpenGL and Direct3D game demo loops that CGW and other Ziff-Davis publications, such as PC Magazine, still use.

Dave has also helped guide Ziff-Davis benchmark development over the years, particularly on 3D WinBench and Audio WinBench. Before coming to CGW, Dave worked at ZD Labs for three years (now eTesting Labs) as a project leader, testing a wide variety of products, ranging from sound cards to servers and everything in between. He also developed both subjective and objective multimedia test methodologies, focusing on audio and digital video. Before all that he toured with a blues band for two years, notable gigs included opening for Mitch Ryder and appearing at the Detroit Blues Festival.