Introduction

It has been an abnormal week for us here at PC Perspective. Our typical review schedule has pretty much flown out the window, and the past seven days have been filled with learning, researching, retesting, and publishing. That might sound like the norm, but in these cases the process was initiated by tips from our readers. Last Saturday (24 Jan), a few things were brewing:

The huge (now 168 page) overclock.net forum thread about the Samsung 840 EVO slowdown was once again gaining traction.

Someone got G-Sync working on a laptop integrated display.

We had to do a bit of triage here of course, as we can only research and write so quickly. Ryan worked the GTX 970 piece as it was the hottest item. I began a few days of research and testing on the 840 EVO slow down issue reappearing on some drives, and we kept tabs on that third thing, which at the time seemed really farfetched. With those two first items taken care of, Ryan shifted his efforts to GTX 970 SLI testing while I shifted my focus to finding out of there was any credence to this G-Sync laptop thing.

A few weeks ago, an ASUS Nordic Support rep inadvertently leaked an interim build of the NVIDIA driver. This was a mobile driver build (version 346.87) focused at their G751 line of laptops. One recipient of this driver link posted it to the ROG forum back on the 20th. A fellow by the name Gamenab, owning the same laptop cited in that thread, presumably stumbled across this driver, tried it out, and was more than likely greeted by this popup after the installation completed:

Now I know what you’re thinking, and it’s probably the same thing anyone would think. How on earth is this possible? To cut a long story short, while the link to the 346.87 driver was removed shortly after being posted to that forum, we managed to get our hands on a copy of it, installed it on the ASUS G751 that we had in for review, and wouldn’t you know it we were greeted by the same popup!

Ok, so it’s a popup, could it be a bug? We checked NVIDIA control panel and the options were consistent with that of a G-Sync connected system. We fired up the pendulum demo and watched the screen carefully, passing the machine around the office to be inspected by all. We then fired up some graphics benchmarks that were well suited to show off the technology (Unigine Heaven, Metro: Last Light, etc), and everything looked great – smooth steady pans with no juddering or tearing to be seen. Ken Addison, our Video Editor and jack of all trades, researched the panel type and found that it was likely capable of 100 Hz refresh. We quickly dug created a custom profile, hit apply, and our 75 Hz G-Sync laptop was instantly transformed into a 100 Hz G-Sync laptop!

Ryan's Note: I think it is important here to point out that we didn't just look at demos and benchmarks for this evaluation but actually looked at real-world gameplay situations. Playing through Metro: Last Light showed very smooth pans and rotation, Assassin's Creed played smoothly as well and flying through Unigine Heaven manually was a great experience. Crysis 3, Battlefield 4, etc. This was NOT just a couple of demos that we ran through - the variable refresh portion of this mobile G-Sync enabled panel was working and working very well.

At this point in our tinkering, we had no idea how or why this was working, but there was no doubt that we were getting a similar experience as we have seen with G-Sync panels. As I digested what was going on, I thought surely this can’t be as good as it seems to be… Let’s find out, shall we?

At the heart of all this technical debate is a performance question: does the GTX 970 suffer from lower performance because of of the 3.5GB/0.5GB memory partitioning configuration? Many forum members and PC enthusiasts have been debating this for weeks with many coming away with an emphatic yes.

The newly discovered memory system of the GeForce GTX 970

Yesterday I spent the majority of my day trying to figure out a way to validate or invalidate these types of performance claims. As it turns out, finding specific game scenarios that will consistently hit targeted memory usage levels isn't as easy as it might first sound and simple things like the order of start up can vary that as well (and settings change orders). Using Battlefield 4 and Call of Duty: Advanced Warfare though, I think I have presented a couple of examples that demonstrate the issue at hand.

Performance testing is a complicated story. Lots of users have attempted to measure performance on their own setup, looking for combinations of game settings that sit below the 3.5GB threshold and those that cross above it, into the slower 500MB portion. The issue for many of these tests is that they lack access to both a GTX 970 and a GTX 980 to really compare performance degradation between cards. That's the real comparison to make - the GTX 980 does not separate its 4GB into different memory pools. If it has performance drops in the same way as the GTX 970 then we can wager the memory architecture of the GTX 970 is not to blame. If the two cards perform differently enough, beyond the expected performance delta between two cards running at different clock speeds and with different CUDA core counts, then we have to question the decisions that NVIDIA made.

There has also been concern over the frame rate consistency of the GTX 970. Our readers are already aware of how deceptive an average frame rate alone can be, and why looking at frame times and frame time consistency is so much more important to guaranteeing a good user experience. Our Frame Rating method of GPU testing has been in place since early 2013 and it tests exactly that - looking for consistent frame times that result in a smooth animation and improved gaming experience.

We will be applying Frame Rating to our testing today of the GTX 970 and its memory issues - does the division of memory pools introduce additional stutter into game play? Let's take a look at a couple of examples.

UPDATE 1/29/15: This forum post has since been edited and basically removed, with statements made on Twitter that no driver changes are planned that will specifically target the performance of the GeForce GTX 970.

The story around the GeForce GTX 970 and its confusing and shifting memory architecture continues to update. On a post in the official GeForce.com forums (on page 160 of 184!), moderator and NVIDIA employee PeterS claims that the company is working on a driver to help improve performance concerns and will also be willing to "help out" for users that honestly want to return the product they already purchased. Here is the quote:

Hey,

First, I want you to know that I'm not just a mod, I work for NVIDIA in Santa Clara.

I totally get why so many people are upset. We messed up some of the stats on the reviewer kit and we didn't properly explain the memory architecture. I realize a lot of you guys rely on product reviews to make purchase decisions and we let you down.

It sucks because we're really proud of this thing. The GTX970 is an amazing card and I genuinely believe it's the best card for the money that you can buy. We're working on a driver update that will tune what's allocated where in memory to further improve performance.

Having said that, I understand that this whole experience might have turned you off to the card. If you don't want the card anymore you should return it and get a refund or exchange. If you have any problems getting that done, let me know and I'll do my best to help.

--Peter

This makes things a bit more interesting - based on my conversations with NVIDIA about the GTX 970 since this news broke, it was stated that the operating system had a much stronger role in the allocation of memory from a game's request than the driver. Based on the above statement though, NVIDIA seems to think it can at least improve on the current level of performance and tune things to help alleviate any potential bottlenecks that might exist simply in software.

As far as the return goes, PeterS at least offers to help this one forum user but I would assume the gesture would be available for anyone that has the same level of concern for the product. Again, as I stated in my detailed breakdown of the GTX 970 memory issue on Monday, I don't believe that users need to go that route - the GeForce GTX 970 is still a fantastic performing card in nearly all cases except (maybe) a tiny fraction where that last 500MB of frame buffer might come into play. I am working on another short piece going up today that details my experiences with the GTX 970 running up on those boundaries.

NVIDIA is trying to be proactive now, that much we can say. It seems that the company understands its mistake - not in the memory pooling decision but in the lack of clarity it offered to reviewers and consumers upon the product's launch.

Yes, that last 0.5GB of memory on your GeForce GTX 970 does run slower than the first 3.5GB. More interesting than that fact is the reason why it does, and why the result is better than you might have otherwise expected. Last night we got a chance to talk with NVIDIA’s Senior VP of GPU Engineering, Jonah Alben on this specific concern and got a detailed explanation to why gamers are seeing what they are seeing along with new disclosures on the architecture of the GM204 version of Maxwell.

Let’s start with a new diagram drawn by Alben specifically for this discussion.

GTX 970 Memory System

Believe it or not, every issue discussed in any forum about the GTX 970 memory issue is going to be explained by this diagram. Along the top you will see 13 enabled SMMs, each with 128 CUDA cores for the total of 1664 as expected. (Three grayed out SMMs represent those disabled from a full GM204 / GTX 980.) The most important part here is the memory system though, connected to the SMMs through a crossbar interface. That interface has 8 total ports to connect to collections of L2 cache and memory controllers, all of which are utilized in a GTX 980. With a GTX 970 though, only 7 of those ports are enabled, taking one of the combination L2 cache / ROP units along with it. However, the 32-bit memory controller segment remains.

You should take two things away from that simple description. First, despite initial reviews and information from NVIDIA, the GTX 970 actually has fewer ROPs and less L2 cache than the GTX 980. NVIDIA says this was an error in the reviewer’s guide and a misunderstanding between the engineering team and the technical PR team on how the architecture itself functioned. That means the GTX 970 has 56 ROPs and 1792 KB of L2 cache compared to 64 ROPs and 2048 KB of L2 cache for the GTX 980. Before people complain about the ROP count difference as a performance bottleneck, keep in mind that the 13 SMMs in the GTX 970 can only output 52 pixels/clock and the seven segments of 8 ROPs each (56 total) can handle 56 pixels/clock. The SMMs are the bottleneck, not the ROPs.

UPDATE 1/26/15 @ 12:10am ET: I now have a lot more information on the technical details of the architecture that cause this issue and more information from NVIDIA to explain it. I spoke with SVP of GPU Engineering Jonah Alben on Sunday night to really dive into the quesitons everyone had. Expect an update here on this page at 10am PT / 1pm ET or so. Bookmark and check back!

UPDATE 1/24/15 @ 11:25pm ET: Apparently there is some concern online that the statement below is not legitimate. I can assure you that the information did come from NVIDIA, though is not attributal to any specific person - the message was sent through a couple of different PR people and is the result of meetings and multiple NVIDIA employee's input. It is really a message from the company, not any one individual. I have had several 10-20 minute phone calls with NVIDIA about this issue and this statement on Saturday alone, so I know that the information wasn't from a spoofed email, etc. Also, this statement was posted by an employee moderator on the GeForce.com forums about 6 hours ago, further proving that the statement is directly from NVIDIA. I hope this clears up any concerns around the validity of the below information!

Over the past couple of weeks users of GeForce GTX 970 cards have noticed and started researching a problem with memory allocation in memory-heavy gaming. Essentially, gamers noticed that the GTX 970 with its 4GB of system memory was only ever accessing 3.5GB of that memory. When it did attempt to access the final 500MB of memory, performance seemed to drop dramatically. What started as simply a forum discussion blew up into news that was being reported at tech and gaming sites across the web.

NVIDIA has finally responded to the widespread online complaints about GeForce GTX 970 cards only utilizing 3.5GB of their 4GB frame buffer. From the horse's mouth:

The GeForce GTX 970 is equipped with 4GB of dedicated graphics memory. However the 970 has a different configuration of SMs than the 980, and fewer crossbar resources to the memory system. To optimally manage memory traffic in this configuration, we segment graphics memory into a 3.5GB section and a 0.5GB section. The GPU has higher priority access to the 3.5GB section. When a game needs less than 3.5GB of video memory per draw command then it will only access the first partition, and 3rd party applications that measure memory usage will report 3.5GB of memory in use on GTX 970, but may report more for GTX 980 if there is more memory used by other commands. When a game requires more than 3.5GB of memory then we use both segments.

We understand there have been some questions about how the GTX 970 will perform when it accesses the 0.5GB memory segment. The best way to test that is to look at game performance. Compare a GTX 980 to a 970 on a game that uses less than 3.5GB. Then turn up the settings so the game needs more than 3.5GB and compare 980 and 970 performance again.

Here’s an example of some performance data:

GTX 980

GTX 970

Shadow of Mordor

<3.5GB setting = 2688x1512 Very High

72 FPS

60 FPS

>3.5GB setting = 3456x1944

55 FPS (-24%)

45 FPS (-25%)

Battlefield 4

<3.5GB setting = 3840x2160 2xMSAA

36 FPS

30 FPS

>3.5GB setting = 3840x2160 135% res

19 FPS (-47%)

15 FPS (-50%)

Call of Duty: Advanced Warfare

<3.5GB setting = 3840x2160 FSMAA T2x, Supersampling off

82 FPS

71 FPS

>3.5GB setting = 3840x2160 FSMAA T2x, Supersampling on

48 FPS (-41%)

40 FPS (-44%)

On GTX 980, Shadows of Mordor drops about 24% on GTX 980 and 25% on GTX 970, a 1% difference. On Battlefield 4, the drop is 47% on GTX 980 and 50% on GTX 970, a 3% difference. On CoD: AW, the drop is 41% on GTX 980 and 44% on GTX 970, a 3% difference. As you can see, there is very little change in the performance of the GTX 970 relative to GTX 980 on these games when it is using the 0.5GB segment.

So it would appear that the severing of a trio of SMMs to make the GTX 970 different than the GTX 980 was the root cause of the issue. I'm not sure if this something that we have seen before with NVIDIA GPUs that are cut down in the same way, but I have asked for clarification from NVIDIA on that. The ratios fit: 500MB is 1/8th of the 4GB total memory capacity and 2 SMMs is 1/8th of the total SMM count. (Edit: The ratios in fact do NOT match up...odd.)

The full GM204 GPU that is the root cause of this memory issue.

Another theory presented itself as well: is this possibly the reason we do not have a GTX 960 Ti yet? If the patterns were followed from previous generations a GTX 960 Ti would be a GM204 GPU with fewer cores enabled and additional SMs disconnected to enable a lower price point. If this memory issue were to be even more substantial, creating larger differentiated "pools" of memory, then it could be an issue for performance or driver development. To be clear, we are just guessing on this one and that could be something that would not occur at all. Again, I've asked NVIDIA for some technical clarification.

Requests for information aside, we may never know for sure if this is a bug with the GM204 ASIC or predetermined characteristic of design.

The questions remains: does NVIDIA's response appease GTX 970 owners? After all, this memory concern is really just a part of a GPU's story and thus performance testing and analysis already incorporates it essentially. Some users will still likely make a claim of a "bait and switch" but do the benchmarks above, as well as our own results at 4K, make it a less significant issue?

Our own Josh Walrath offers this analysis:

A few days ago when we were presented with evidence of the 970 not fully utilizing all 4 GB of memory, I theorized that it had to do with the reduction of SMM units. It makes sense from an efficiency standpoint to perhaps "hard code" memory addresses for each SMM. The thought behind that would be that 4 GB of memory is a huge amount of a video card, and the potential performance gains of a more flexible system would be pretty minimal.

I believe that the memory controller is working as intended and not a bug. When designing a large GPU, there will invariably be compromises made. From all indications NVIDIA decided to save time, die size, and power by simplifying the memory controller and crossbar setup. These things have a direct impact on time to market and power efficiency. NVIDIA probably figured that a couple percentage of performance lost was outweighed by the added complexity, power consumption, and engineering resources that it would have taken to gain those few percentage points back.

With the release of GTX 960 yesterday NVIDIA also introduced a new version of the GeForce graphics driver, 347.25 - WHQL.

NVIDIA states that the new driver adds "performance optimizations, SLI profiles, expanded Multi-Frame Sampled Anti-Aliasing support, and support for the new GeForce GTX 960".

While support for the newly released GPU goes without saying, the expanded MFAA support will help provide better anti-aliasing performance to many existing games, as “MFAA support is extended to nearly every DX10 and DX11 title”. In the release notes three games are listed that do not benefit from the MFAA support, as “Dead Rising 3, Dragon Age 2, and Max Payne 3 are incompatible with MFAA”.

347.25 also brings additional SLI profiles to add support for five new games, and a DirectX 11 SLI profile for one more:

SLI profiles added

Black Desert

Lara Croft and the Temple of Osiris

Nosgoth

Zhu Xian Shi Jie

The Talos Principle

DirectX 11 SLI profile added

Final Fantasy XIV: A Realm Reborn

The update is also the Game Ready Driver for Dying Light, a zombie action/survival game set to debut on January 27.

UPDATE 2: If you missed the live stream you missed the prizes! But you can still watch the replay to get all the information and Q&A that went along with it as we discuss the GTX 960 and many more topics from the NVIDIA universe.

UPDATE (1/22): Well, the secret is out. Today's discussion will be about the new GeForce GTX 960, a $199 graphics card that takes power efficiency to a previously un-seen level! If you haven't read my review of the card yet, you should do so first, but then be sure you are ready for today's live stream and giveaway - details below! And don't forget: if you have questions, please leave them in the comments!

Get yourself ready, it’s time for another GeForce GTX live stream hosted by PC Perspective’s Ryan Shrout and NVIDIA’s Tom Petersen. Though we can’t dive into the exact details of what topics are going to be covered, intelligent readers that keep an eye on the rumors on our site will likely be able to guess what is happening on January 22nd.

On hand to talk about the products, answer questions about technologies in the GeForce family including GPUs, G-Sync, GameWorks, GeForce Experience and more will be Tom Petersen, well known on the LAN party and events circuit. To spice things up as well Tom has worked with graphics card partners to bring along a sizeable swag pack to give away LIVE during the event, including new GTX graphics cards. LOTS of graphics cards.

NVIDIA GeForce GTX 960 Live Stream and Giveaway

10am PT / 1pm ET - January 22nd

Here are some of the prizes we have lined up for those of you that join us for the live stream:

3 x MSI GeForce GTX 960 Graphics Cards

4 x EVGA GeForce GTX 960 Graphics Cards

3 x ASUS GeForce GTX 960 Graphics Cards

Thanks to ASUS, EVGA and MSI for supporting the stream!

The event will take place Thursday, January 22nd at 1pm ET / 10am PT at http://www.pcper.com/live. There you’ll be able to catch the live video stream as well as use our chat room to interact with the audience, asking questions for me and Tom to answer live. To win the prizes you will have to be watching the live stream, with exact details of the methodology for handing out the goods coming at the time of the event.

Tom has a history of being both informative and entertaining and these live streaming events are always full of fun and technical information that you can get literally nowhere else. Previous streams have produced news as well – including statements on support for Adaptive Sync, release dates for displays and first-ever demos of triple display G-Sync functionality. You never know what’s going to happen or what will be said!

If you have questions, please leave them in the comments below and we'll look through them just before the start of the live stream. Of course you'll be able to tweet us questions @pcper and we'll be keeping an eye on the IRC chat as well for more inquiries. What do you want to know and hear from Tom or I?

So join us! Set your calendar for this coming Thursday at 1pm ET / 10am PT and be here at PC Perspective to catch it. If you are a forgetful type of person, sign up for the PC Perspective Live mailing list that we use exclusively to notify users of upcoming live streaming events including these types of specials and our regular live podcast. I promise, no spam will be had!

"NVIDIA is today launching a GPU aimed at the "sweet spot" of the video card market. With an unexpectedly low MSRP, we find out if the new GeForce GTX 960 has what it takes to compete with the competition. The MSI GTX 960 GAMING reviewed here today is a retail card you will be able to purchase. No reference card in this review."

A new GPU, a familiar problem

Editor's Note: Don't forget to join us today for a live streaming event featuring Ryan Shrout and NVIDIA's Tom Petersen to discuss the new GeForce GTX 960. It will be live at 1pm ET / 10am PT and will include ten (10!) GTX 960 prizes for participants! You can find it all at http://www.pcper.com/live

There are no secrets anymore. Calling today's release of the NVIDIA GeForce GTX 960 a surprise would be like calling another Avenger's movie unexpected. If you didn't just assume it was coming chances are the dozens of leaks of slides and performance would get your attention. So here it is, today's the day, NVIDIA finally upgrades the mainstream segment that was being fed by the GTX 760 for more than a year and half. But does the brand new GTX 960 based on Maxwell move the needle?

But as you'll soon see, the GeForce GTX 960 is a bit of an odd duck in terms of new GPU releases. As we have seen several times in the last year or two with a stagnant process technology landscape, the new cards aren't going be wildly better performing than the current cards from either NVIDIA for AMD. In fact, there are some interesting comparisons to make that may surprise fans of both parties.

The good news is that Maxwell and the GM206 GPU will price out starting at $199 including overclocked models at that level. But to understand what makes it different than the GM204 part we first need to dive a bit into the GM206 GPU and how it matches up with NVIDIA's "small" GPU strategy of the past few years.

The GM206 GPU - Generational Complexity

First and foremost, the GTX 960 is based on the exact same Maxwell architecture as the GTX 970 and GTX 980. The power efficiency, the improved memory bus compression and new features all make their way into the smaller version of Maxwell selling for $199 as of today. If you missed the discussion on those new features including MFAA, Dynamic Super Resolution, VXGI you should read that page of our original GTX 980 and GTX 970 story from last September for a bit of context; these are important aspects of Maxwell and the new GM206.

NVIDIA's GM206 is essentially half of the full GM204 GPU that you find on the GTX 980. That includes 1024 CUDA cores, 64 texture units and 32 ROPs for processing, a 128-bit memory bus and 2GB of graphics memory. This results in half of the memory bandwidth at 112 GB/s and half of the peak compute capability at 2.30 TFLOPS.