NVIDIA GeForce GTX 680 2GB Graphics Card Review - Kepler in Motion

The Kepler Architecture

Join us today at 12pm EST / 9am CST as PC Perspective hosts a Live Review on the new GeForce GTX 680 graphics card. We will discuss the new GPU technology, important features like GPU Boost, talk about performance compared to AMD's lineup and we will also have NVIDIA's own Tom Petersen on hand to run some demos and answer questions from viewers. You can find it all at http://pcper.com/live!!

NVIDIA fans have been eagerly waiting for the new Kepler architecture ever since CEO Jen-Hsun Huang first mentioned it in September 2010. In the interim, we have seen the birth of a complete lineup of AMD graphics cards based on its Southern Islands architecture including the Radeon HD 7970, HD 7950, HD 7800s and HD 7700s. To the gamer looking for an upgrade it would appear that NVIDIA had fallen behind; but the company is hoping that today's release of the GeForce GTX 680 will put them back in the driver's seat.

This new $499 graphics card will directly compete against the Radeon HD 7970, and it brings quite a few "firsts" to NVIDIA's lineup. This NVIDIA card is the first desktop 28nm GPU, the first to offer a clock speed over 1 GHz, the first to support triple-panel gaming on a single card, and the first to offer "boost" clocks that vary from game to game. Interested yet? Let's get to the good stuff.

The Kepler Architecture

In many ways, the new 28nm Kepler architecture is just an update to the Fermi design that was first introduced in the GF100 chip. NVIDIA's Jonah Alben summed things up pretty nicely for us in a discussion stating that "there are lots of tiny things changing (in Kepler) rather than a few large things which makes it difficult to tell a story."

The chip that the GeForce GTX 680 is built on — GK104 — is seen in its block diagram form above. Already, you can see a big difference between this and the GTX 580 flagship card before it. There are 1536 stream processors / CUDA cores on GTX 680 compared to the 512 cores found in GTX 580 cards. The divisions of the GPU still exist in NVIDIA's design — the GPC is a combination of SMs — though they have changed as well. A GPC now includes two SMX units (seen below) where the GTX 580 GPC included four SMs each.

With the SM increasing from 32 cores to 192 cores each, NVIDIA is claiming a performance per watt metric improvement of 2x which is becoming a crucial factor as designers focus on the thermal limits and power consumption of GPUs.

Kepler SMX Block Diagram

The SMX unit consists of 192 CUDA cores, an updated PolyMorph Engine, 16 texture units, thread scheduling, among others. Further, the cores are arranged differently than we saw in Fermi with six cores per special function unit (SFU) instead of four. Warp (thread) count has gone from 48 to 64 in Kepler.

With the 128 total texture units on the GTX 680 (twice what we had on the GTX 580) and an increase in cores of nearly 3x, you might be wondering how it all balances out. You may also be curious whether Kepler is really 3x as fast as Fermi.

Gone away is the "hot clock" of NVIDIA GPUs where the cores would operate at twice the clock rate of the base GPU. Instead Kepler now runs the entire chip at the same clock rate. The reasoning is a trade off in terms of die space and power consumption. Engineers were able to reduce the clock power by half and logic power by 10% at the expense of some die area, but with a focus on power efficiency on this design it was a change they were obviously willing to make.

Another change in Kepler is found in the scheduling component where much of the process is actually moved from hardware to software to be run in the NVIDIA driver. Because the software is already handling so much of the decoding process from DirectX, CUDA, OpenCL, and more NVIDA found it to be more power efficient to continue to increase the workload in the software rather than on the chip itself. Some items remain on die though because of latency concerns, such as texture operations.

Because of a reduction in the number of SMX units per chip, NVIDIA had to double up on the performance of individual PolyMorph engines. But because we have half the SMX units on Kepler as you did on Fermi, total chip performance hasn't changed much.

Compared to AMD's Radeon HD 7970 the GTX 680 is actually a bit slower at lower expansion factors and it's not until we hit 11x that we start to see the advantages NVIDIA once claimed to have throughout the scale. Both companies debate which factors are most important though to game developers with AMD claiming that the lower factors are much more often used.

For the new memory design NVIDIA has gone with a 256-bit controller (compared to the 384-bit found on Fermi) though the clock speeds are running at 6 Gbps (1500 MHz)! The total memory bandwidth provided by this design is 192 GB/s, which is basically identical to that of the GTX 580. ROP count has decreased from 48 on the GTX 580 to 32 on Kepler/GTX 680, however.

Today's GTX 680 will ship with a 2GB frame buffer and some users may lament of expectation for NVIDIA to match AMD's 3GB memory configuration on the HD 7900 cards. While we are never one to say we don't want MORE memory on our GPUs, in our testing we have not seen detrimental effects of 2GB versus 3GB of memory even on multi-display gaming.

The GTX 680 is indeed a PCI Express 3.0 compatible card and the GPU does support DX11.1 features as well, but it isn't really anything to get excited about just yet.

One interesting change is the addition of NVENC, a dedicated video encoding engine that is built (essentially) to rival the QuickSync technology found in Intel's Sandy Bridge processors. The logic is completely fixed function now — it is no longer using the CUDA cores to encode video — and NVIDIA claims that it is even more power efficient than Intel's implementation. In fact, I was told by designers that the NVENC feature could actually be used while the GPU was powered off.

Another important change is found in the display support on Kepler as NVIDIA has finally moved away from the two display limit on single GPU cards. You can now run up to four displays on a single card, and run three of them in an NVIDIA Surround or 3DVision Surround configuration for multi-display gaming. This is obviously a feature that NVIDIA has needed for quite some time, and we are glad to see it in Kepler. DisplayPort 1.2 support is included as well.

And here she is, the Kepler die in all her glory. The 28nm GPU is built with 3.54 billion transistors and is 294mm^2.

Hey thanks for the much detailed review. I'm a gamer and try to upgrade my rig from time to time. Still using a pair of AMD 5870's and thinking/hoping to do an upgrade sometime. I've been thinking amd7970, but open to Nvidia too.

So I'll keep watching for more. Looking forward to Josh's review of the MXI 7970R.

Awesome! Triple monitor benches and a perfomance/dollar increase. Any idea why the 7970 and 7950 are dead even at 5760x1080, do they have some common limiting factor at that res.? This is going to make for a tough choice if AMD drops their pricing to be competitive with this!

I checked some benchmarks around the web. This card has almost nonexistant GPU compute portion, this card is even worse than GTX 570 in GPU rendering, even worse, it is more than 2 times slower than the Radeon HD 7870. And that doesn't surprise me, only 3.5 billion transistors, compared to previous 3.2 billion.
This is a Gaming card only, don't make your self a fool and try to use this as a card for doing some work that pays you back with some green bucks. :P
I may be a bit harsh, but mostly it is true.

Works great on cloth and velour as well as in the pockets
and recesses in the block and head. Chunks of sludge would work loose, and then trash your wheel wells.
First of all it is necessary for one to save enough money before attending lawn care 76017 school.

Great review, as usual. A little disappointed by the mins, but generally terrific absolute performance despite the low power consumption and reasonable temps. And nvidia is being reasonable with their pricing as well. Have been waiting for this for my son's rig, which is presently running two 460 1gb hawks in sli. At that price, with this performance, we are probably going to pull the trigger if they come back into stock at newegg at the same price.

Update:EVGA came back in stock and we pulled the trigger. Glad I had a $250 gift card from way back that I was saving for just such an occasion.

Good review, been waiting on a good upgrade to my gtx 560's in sli. This should be a great card with the low power draw. I love my 560's but in battlefield the drop in fps to 30 fps at random points because of sli have been really annoying. I wonder if anyone has figured that out yet. (Ryan posted about this back in December I think)Doesn't matter as I just picked up the Msi card from Newegg as I was reading article I kept refreshing Newegg and it jumped in stock and I was able to get one. I can't wait , I ordered it next day for a total of $523.22 not bad. (Glad I got in before the price gouging begins) I am sure glad Nvidia this time around did better on their pricing. I was holding off on getting new card till these came out to see which was a better option. I have always like Nvidia for their drivers and control panel over Amd/Ati. I am glad I waited. Now just need to save for another one later on and get a triple 27" monitor setup and I will be golden.

Quick question Ryan. Will the new card and drivers allow you configure just two monitors for surround. Currently eyefinity lets you group two monitors together as a single display/resolution. Can you do this with the GTX 680?

Thanks for the info. I have a very specific application in mind, which probably wouldn't apply to most people.

I've recently built a dual projector system for passive 3D. Having a 3840 x 1080p perfectly synced screen allows me to play full frame 1080p side by side 3d video. Doing this allows me to give each eye the full 1080p image.

The eyefinity setup keeps the monitors perfectly in sync as long as I use two of the same outputs, minidisplay port to hdmi for both. Mixing outputs introduced a very slight lag to one of the displays, which is barely noticable.

There is some other software(Stereoscopic Player) which will play 3d video to two mintors for this purpose, but I had trouble getting everything to sync up.

I was hoping Nvidia would add this feature, as I greatly prefer there drivers for most games. There is one option, I could buy one of the Matrox Triple Head to go adapt. and use it with the Nvidia card, but obv. there is a little extra cost going that route. Thanks for the info!

Nowadays it isn't that hard to make a GPU just for gaming. In raytracing rendering tasks it is surpassed even by GTX 570, not to mention radeon HD 7870 that does so more than two times.
It seems that nvidia will be adequate for productiviy work only with GK110 die. Because transistors will go from 3.5 to 6 billion, but Cuda cores only from ~1500 to ~2000, so it could be due to a higher percentage of computing cores on a given die.

"Because the software is already handling so much of the decoding process from DirectX, CUDA, OpenCL, and more NVIDA found it to be more power efficient to continue to increase the workload in the software rather than on the chip itself. Some items remain on die though because of latency concerns, such as texture operations."

How much of a burden does this place on the CPU? 1%, 5%, 10%, or more? Will the driver sense which CPU core is being utilized the most and offload this processing to a core that is not being utilized for the game (knowing that most games probably don't use more than two cores)?

I asked this question and was told that move to the CPU was "very minimal" and that overall it was "the most efficient way" to get things done.

Though, like you, I wonder how much of this makes the GPU more efficient while the CPU LESS efficient. If it were a big deal though it would have shown up in our power consumption testing since it monitors total system power consumption or in performance result drops.

Minimal is probably a good term here. With the ability to lay hands on inexpensive multi-core CPUs that often are not stressed in any game at standard resolutions, this software overhead looks to be a non-factor. There might be some corner cases, but you would expect the bottleneck to be somewhere else.

Ryan, I know you were extremely busy or rushed in your review, I hope you read this and consider it deeply. I read the review at work, finally get to comment. I really think the review is lacking in a few key areas, and not to say it is biased, but it leaves out some critical information.

1. The OC of the 7970 is VASTLY superior to the Nvidia card. The MSI Lightning stock OC goes to 1070, and adds 5-7 AVG FPS in gaming. When you OC that to around 1150 as just about every reference card has been able to do, it is another 5-7 AVG FPS. That isn't including the "good" cards that can OC to ~1300 and have 20-25% more frames over the stock ref designs.

2. The skyrim bench especially as well as batman appear to simply be games where the Nvidia support at development clearly gives them the edge. It would be nice to see some other games added to the benchmarking roster of games.

Specifically, R.U.S.E., Alan Wake, Wings of Prey and even Crysis (not 2) would be some good games to add. The first 3 have some very nice "effects" settings which really push the high end cards. I have a 4850, the only one of those I can run well is crysis, but I have to turn everything down and middle of the road.

3. We ALL KNOW that the ATI 79xx cards are underclocked. To the point where the 7950 beats the 7970 with a slight OC. It is extremely interesting how Metro 2033 was the single benchmark where the ATI cards pulled ahead, but contrasting that with the skyrim results, it is clear as a reader that something isn't right here. BF3 has a slight nvidia edge, Batman and Skyrim appear to have a clear Nvidia edge, but it would seem that every other game might be a flip-flop type scenario, where one card does better on one thing and the others do things better on a different game. It appears to be a driver issue, and we all know that ATI has had dramatic driver issues with the 7 series, to the point where they barely came out weeks ago.

4. The speedboost, in my opinion, is a very poor "feature". I get the idea behind it, but it seems to be clearly for power savings only and seems very much like a marketing feature or something that would make sense on a laptop rather then a desktop. I would have loved to see a feature or setting where that can be disabled, and traditional OC is possible. I see it as a negative that this isn't available.

5. The OC being limited to around 2% is a very poor limit and shows just how much the turbo effects everything. I have seen the MSI Lightning 7970, as mentioned above, OC well beyond the 10-15% normal range, upwards of adding 20-25 AVG FPS to some games.

6. Even though we all know ATI is dropping prices, it is clear that the review was tilted in a way which let it seem like nvidia was doing "a favor" by undercutting the price. If you add up all of the above, it isn't simply as clear as that. Whoever comes out first gets the high price, it always, always drops. I think if you visit the price in a week, let alone a month, it will make a lot more sense, and pricing shouldn't be handled as a major factor when we have news that the competitor will be dropping prices in response. It comes off as irresponsible to the reader when I see that, or when I see things not mentioned like that in reviews.

It's a lot to digest, but I hope it makes sense.

I would love to see an "OC" comparison when you do SLI testing, as well as some new games added to the roster for testing.

1. That's good to know, but we aren't testing overclocked cards. We are testing reference cards in this review. We do review OC model cards (like the XFX model 7970 we tested) and will have a test of the Lightning next week. I think your estimate of 25% better improvement is an overestimate.

2. That's crap. Metro 2033 was an "NVIDIA game" and Dirt 3 was an "AMD game" yet those titles don't show any bias. Both vendors can write driver fixes to ANY game.

3. Fair enough - but we can only test based on the real user experience that you get when you install the cards and play the games. Saying that the HD 7970 is underclocked to me simply means that AMD didn't have the balls to release a card at the CORRECT speed then.

4. I think you misunderstand the feature then - I think it is really a compelling addition to get users the full capability of their GPU regardless of the game/application being played. When our video interview with Tom Petersen is online, you should take a listen?

5. I was able to hit a 150 MHz offset on the GTX 680 which is basically a 15% overclock.

6. Pricing is THE MAJOR factor other than performance. Yes, it is always changing, but to pretend it isn't a crucial factor is simply insane.

Thanks for you comments man, I love to hear what others are thinking, even if we disagree! :D

1. It was just one of those things that glared out to me, and the 25% is probably 20% based on the HardOCP review, but that is a bunch more then 2%, which would make things extremely close in the end.

2. I don't know if AMD helped with metro, or if all amd cards run better on it, but the point is, Nvidia clearly had helped with physx and some filtering stuff in bf and others, it would be a good idea to add some variety to the benches, I would add R.U.S.E. and WoP for sure (they are different types of games then the ones you have listed, and as I said, they have some very nice high end dx10/11 features).

3. I completely agree, but it would have been nice as a reader to see some "by the way" type text.

4. I didn't say there wasn't a point to it, but I just don't get it, or better put, agree with it. Why with 1500+ cores you would want to OC them on the fly? I would want to be able to test things and make them stable. I see benchmarks vastly varying with games, it could be related to this "feature". Maybe heat of the case/card forces some bad benches for some sites and better scores for others? IDK, just something to look into. I know for power specifically people have "proof" of shenanigans, especially with 3dmark, so who really knows, but I know there is more then meets the eye with this.

5. But the performance was only 2% better, not much higher, The high binned 7970's go to 1250-1300+ which is around (35-40%, even at 1150 it is 25%)

6. It has more to do with the fact that we already know that amd is going to reduce their pricing, and it is going to be within a week more then likely. If they don't the review stands, but if they do, then it really comes off as overly harsh for no reason in the long run. Let me put it this way, have you ever gone back and revised this, for the people who look at your review AFTER the prices drop? Nope, because it ALWAYS happens and it always has to do with either a competitor releasing something, or a new card being out. In my opinion, it always seems like after a week the pricing on reviews is completely off.

Like I said, I didn't want to come off as a jerk or w/e, but it was stuff I wanted you to discuss or look into. The point with the OC stuff was to bring it up before you do SLI/OC testing, and I know Josh is writing the lightning review, but it would be a good idea to toss those numbers in the mix and get an idea of which one, OC'd or at it's best let's say, actually performs the best. It seems very straightforward to me that the 7970 OCs a lot better.

Actually, I know of this phenomenon but I didn't experience with the GTX 680. I have with some HD 7000 cards though. In my opinion this is some kind of odd mix of PSU/GPU that causes it. I have never been able to pinpoint one architecture of PSU as the culprit.

Ryan, can you put that fancy SPL meter on this beast and provide some noise measurements?

I run a Define R3 chassis and use "reference" blower-style fans on my GPUs in order to keep the noise/heat down during idle periods and provide more effective heat dissipation under load without adding more than the 2 fans I currently have. It's loud under load, but I can deal with that since I game with a headset.

Would be interesting to see how this compares to, say, my current 570 blower. Especially considering every new card release comes along with a marketing line regarding lower heat and noise.

Wow just got the gtx680 installed and all I can say is Wow!!!!. I have bought many of top of the line video card over the years, and this is the first time I have ever been completely satisfied with one. This one is amazing. So powerful, quiet and only running 65c under load in Nvidia cooler master stacker case. Battlefield 3 run so smooth even running on 32inch 1080p ultra with vsync enabled never drops below 60 fps. Crysis 2 smooth as butter on extreme hi res textures. Now I think its time to look into getting into surround setup. if I could only get some good 27inch with small bezels for a good price.

The reason for frame rate capping is to lower the temps of the card. Which also reduces the fanspeed automatically and consuming less power overall.

It works, I tried it in Battlefield 3 with my two GTX 580's in SLI. Normally I get about 100 FPS on Ultra, gpu's at 82C max. With a FPS cap of 80 FPS, I drop temps down to 70's. 10 degree savings and quieter operation.