One of the 5970’s unique attributes was that while at default clocks and voltages it was designed to meet a 300W TDP, it was designed for much more. AMD’s design called for it to be able to handle 400W, the amount of power needed to operate the card as if it were a true dual-GPU 5870. In practice this fell a bit short due to VRM temperatures, but for most games this was a workable solution.

In AMD’s case it has paid off well enough that with the 6990 they are returning with the same philosophy, differing only in implementation details. AMD’s engineers have gone and built a card that can run its GPU at 6970-like GPU clocks (880MHz), you just have to do some overclocking to get there. And while AMD’s legal department will tell you that no overclock is guaranteed and that doing so voids any warranty, the design and the binning of GPUs virtually ensures every card can hit 6970 core clocks.

AMD refers to the 6990 as a 450W card. At default clocks it has a rated TDP of 375W but the cooler itself is designed to take 450W, which is why AMD went with so many design changes such as the dual-exhaust system and the exotic thermal compound. The result is that the card can generally keep itself cool at 6970 speeds, and in fact does a better job of this than the 5970 did at 5870 speeds. The catch here is that you will need sufficient cooling to deal with the heat the card dumps in to the case, 225W+ to be precise. Thus while the 6990 is already a card with specialized cooling requirements, the 6990 when overclocked is even more so. With FurMark our numbers point to our card drawing more than 500W, so 6990 overclocking is not for the faint of heart.

With the 5970 AMD enabled overclocking by producing a quick & dirty utility to bump the card’s voltage up to 5870 voltages, which then could be used with Overdrive to achieve the desired clocks. This certainly worked but it wasn’t smooth and it wasn’t consistent - not every vendor used AMD’s utility (particularly if they had their own in-house overclocking utility), and if you did use AMD’s utility then you had to set the voltage and do overclocking on every boot. AMD is not about to include voltage controls in the Catalyst Overdrive controls, so they’ve gone for a better way.

The 5970's ATI Overvolt Tool

Do you recall the BIOS selection switch on the 6900 series cards? On those cards, it was to allow users to safely flash new BIOSes to their cards while having a fallback BIOS to work from. The 6990 takes this concept and repurposes it to fit the 6990’s unique overclocking needs. The switch is still there, but instead of identical BIOSes the switch controls which performance BIOS is used. Position 2, the default position, is a write-protected BIOS that runs the 6990 at its default core clock of 830MHz and default core voltage of 1.12v. Position 1 is a write-enabled BIOS that runs the 6990 at the same core speeds and voltages as the 6970: 880MHz core clock and 1.175v core voltage; meanwhile memory clocks remain unchanged at sub-6970 speeds of 5GHz. AMD calls it the AUSUM switch (Antilles Unlocking Switch for Uber Mode); ignore the name, focus on the fact that the switch is what controls the core voltage on the 6990.

6950/6970 BIOS Switch

From a usability standpoint, the benefit of using the BIOS switch for this is that it’s much more consistent across vendors and it doesn’t require any software interaction. Just flip the switch and you’re done. However we would still count on seeing some vendors taking things a step further and offering fine-tuned voltage control for the card.

Along with the increase in the core clock and the voltage, AMD’s documentation also lists the PowerTune limit as being increased for uber mode. AMD tells us that the limit here is 450W (540W with +20% PT), however in our testing we were unable to hit that limit. Every test up to and including FurMark ran unthrottled, and we peg power consumption there at over 500W. If indeed there isn’t a PowerTune limit this is good news for extreme overclockers, but it means if you use uber mode PowerTune won’t be there to save your bacon if you push too hard.

Radeon HD 6990 BIOS Switch

Position

Core Clock/Voltage

PowerTune Limit

Write-Protected

1

880Mhz/1.175v

None

No

2 (Default)

830MHz/1.12v

375W

Yes

As far as additional overclocking is concerned we did not push our sample beyond uber clockspeeds. In uber mode we were already hitting GPU temperatures of 94C in Furmark, which is as high as we’re willing to go. Better cooling of course would allow easier overclocking, and with a an overdrive limit of 1.2GHz in uber mode, the card should vanish in a puff of smoke well before Overdrive becomes a limit.

Radeon HD 6990 Overdrive Limits

Of course all of this talk of overclocking cannot be held without saying something about power consumption. With 2 8pin PCIe power sockets the 6990 is already drawing the full 150W per 8pin line the PCIe specification calls for; uber mode exceeds this, potentially by quite a bit. AMD has engineered the 6990 to pull most excess power from the PCIe power sockets and not the slot itself (since the slot is the weakest link), so a notably overbuilt power supply would be necessary. AMD hasn’t provided any official guidance here, but a well-built power supply offering 20A (240W) per 8pin line with an independent rail for each line would seem to be the minimum to get away with uber mode.

Ultimately however, as we’ll see the 6990OC doesn’t have nearly as large a performance bump to it as the 5970OC did. Thanks to the much higher default clocks, the 6990OC’s core clock is only 6% faster and the memory clock is the same, versus 17% faster on the core clock and 20% faster on the memory clock for the 5970. As a result you get much better performance out of the box, but unlike the 5970 flipping the magic switch doesn’t significantly increase the card’s performance this time around. So unlike the 5970 if you want to significantly improve performance over stock, you’ll have to do some equally significant custom overclocking on the 6990.

Finally, in a close examination of a minor detail, unlike on the 6950/6970 it’s clear that AMD doesn’t intend for this switch to be easily accessible. The switch on the 6990 is slightly recessed, not by enough to make it hard to hit but enough that you’ll never accidentally hit it. Flipping the switch would need to be a conscientious action, which makes sense given the fact that doing so would void the card’s warranty.

Update: After publication of this article there's been some slight confusion on the matter of the AWSUM switch and the warranty. AMD's official guidance is that overclocking the card voids the warranty, which means that AWSUM/uber mode is warranty breaking. Technically speaking just flipping the switch doesn't break the warranty - it's operating the card that does - but retail cards will come with a sticker over the switch warning users of the potential danger of overclocking and that it violates the warranty. So breaking the sticker to flip the switch will for all practical purposes violate the warranty. Specific policies may differ by partner, however.

130 Comments

Not even close, unless you are talking about outdated distributed computing projects like Folding@Home code. Try any of the modern DC projects like Collatz Conjecture, MilkyWay@home, etc. and a single HD4850 will smoke a GTX580. This is because Fermi cards are limited to 1/8th of their double-precision performance.

In other words, an HD6990 which has 5,100 Gflops of single-precision performance will have 1,275 Glops double precision performance (since AMD allows for 1/4th of its SP). In comparison, the GTX470 has 1,089 Gflops of SP performance which only translates into 136 Gflops in DP. Therefore, a single HD6990 is 9.4x faster in modern computational GPGPU tasks. Reply

Those are just theoretical performance numbers. Not all programs *even newer ones* can effectively extract ILP from AMD's VLIW4 architecture. Those that can will no doubt with faster; others that can't would be slower. As far as I'm aware lots of programs still prefer nV's scalar arch but that might change with time.Reply

Well.. if you can oly use 1 of 4 VLIW units in DP then you don't need any ILP. Just keep the threads in flight and it's almost like nVidias scalar architecture, just with everything else being different ;)

It all depends on the driver and compiler implementation, and the guy/gal coding it. If you code the same but the compilers are generations apart, then the compiler with the higher generation wins out. If you've had more experience with CUDA based OpenCL, then your NVIDIA OpenCL implementation will outperform your ATI Stream implementation. Pick your card for it's purpose. My homebrew stuff works great on NVIDIA, but I only code for NVIDIA - same thing for big league compute directions.Reply