Driver software to be tweaked to reduce Radeon frame latencies in series of updates

Portion of AMD's driver code needs love to perform optimally.

This story was brought to you by our friends at The Tech Report. You can view the original story here.

To our surprise, we recently found that the GeForce GTX 660 Ti generally outperforms the Radeon HD 7950 in our latency-focused tests in many of the latest games, despite the fact that the Radeon is based on decidedly beefier hardware. Although the Radeon cranked out conventionally respectable FPS averages, it often produced a number of long-latency frames interspersed throughout our testing sessions. Follow-up testing confirmed the problem isn't confined to Windows 8, and we even posted a slow-motion video illustrating the issue. We concluded that AMD has work to do in optimizing its drivers for the latest games.

Earlier today, in my blog post, I noted that AMD's David Baumann had posted in a thread at Beyond3D, stating that a host of different software-related issues are potentially responsible for the Radeon's latency issues. He claimed the slowdowns in Borderlands 2 are a buffer-sizing issue that could be addressed via a Catalyst Application Profile (CAP) update.

After seeing my blog post, Baumann contacted us to provide some additional insight into the situation, including word of a series of driver updates in the works intended to smooth out frame latencies. He writes:

The comment that you quote was just one update that highlights that some things can be tweaked fairly easily (although since coming back today I learn that it is not quite as easy as the BL2 fix does actually need to be implemented in the driver so we will have to QA a new build rather than releasing a CAP). Over the early part of the year you'll see a few driver updates help this across a variety of games.

We're pleased to see that AMD will be addressing these issues soon, even if Borderlands 2 can't be patched via a CAP update.

The most intriguing revelation in Baumann's correspondence, though, concerns one specific technical contributor to the frame latency problems on HD 7000-series Radeons based on the GCN architecture: less-than-optimal memory management in software.

Additionally, when we switched from the old VLIW architecture to the GCN core... significant updates to all parts of the driver was needed – although not really spoken about the entire memory management on GCN is different to prior GPU's and the initial software management for that was primarily driven by schedule and in the meantime we've been rewriting it again and we have discovered that the new version has also improved frame latency in a number of cases so we are accelerating the QA and implementation of that.

So a specific portion of AMD's driver code needs some additional attention in order to perform optimally on the year-old GCN architecture—and AMD has accelerated an overhaul of it after discovering that the new revision can alleviate frame latency issues. Wow.

Although we're not happy about the situation facing current Radeon owners, we're gratified to see that AMD has taken notice of the problems and is working to resolve them. We're also thrilled by the possibility that our latency-focused game testing may have helped nudge one of the major GPU makers into making changes that could result in improved gameplay fluidity for PC gamers going forward. Stay tuned to TR for additional updates on this situation as they become available.

That's nice and all, but I really haven't noticed any issues with my Radeons, and I spent a fair bit of time trying after I saw the older article over at Tech Report. Good that they're accelerating the schedule, but hopefully that doesn't result in any unintended consequences. Since it's already working fine I'd rather they take their time to stabilise it rather than rushing out a fix for something that's not really a problem.

"...they don't prevent the card from overheating when overclocked and overvolting..."

Perhaps I'm old-fashioned but I always assumed hardware failure was the risk you ran when you overclocked something? Granted "bursting into flames" is a pretty dramatic species of failure, but come on now.

That's nice and all, but I really haven't noticed any issues with my Radeons, and I spent a fair bit of time trying after I saw the older article over at Tech Report. Good that they're accelerating the schedule, but hopefully that doesn't result in any unintended consequences. Since it's already working fine I'd rather they take their time to stabilise it rather than rushing out a fix for something that's not really a problem.

This is why I've disliked both of the recent GPU articles from Tech Report, they make it sound like they've revolutionized bench-marking by pointing out blips that most people don't even notice. Most people that notice a problem take steps to fix it, like using RadeonPro, or trying beta drivers, which I'm pretty sure they did neither of in the last article Ars posted. I'm glad they're focused on improving, it means they should stay competitive in years to come.

Luckily for you, if you're not having an issue you can simply not upgrade, which means you'll be safe from any unintended consequences.

AMD driver issues have never killed cards, and certainly not two separate drivers

Killed my really nice Asus 8800GTS, and the warranty replacement was a really shitty GTS250, which was as fast or faster but had a horribly noisy fan. Your point certainly stands.

After that debacle I switched over to ATi with a 5850 and had generally good times with it, certainly a good card and I had very few actual driver problems.

But it's stuff like this that annoys me. The 7 series AMD chips are a year old, and they can't possibly have started working on a memory management issue for a year. That tells me TR clearly demonstrated a problem AMD may or may not have known about, so they either have shitty testing, ignored the problems or didn't have the engineering resources. None of those enthuse me.

AMD driver issues have never killed cards, and certainly not two separate drivers

Killed my really nice Asus 8800GTS, and the warranty replacement was a really shitty GTS250, which was as fast or faster but had a horribly noisy fan. Your point certainly stands.

After that debacle I switched over to ATi with a 5850 and had generally good times with it, certainly a good card and I had very few actual driver problems.

But it's stuff like this that annoys me. The 7 series AMD chips are a year old, and they can't possibly have started working on a memory management issue for a year. That tells me TR clearly demonstrated a problem AMD may or may not have known about, so they either have shitty testing, ignored the problems or didn't have the engineering resources. None of those enthuse me.

I don't know why I keep trying out ATI/AMD cards I had issues when I had my 9800 and now I have issues with my 6850. It shouldn't be this hard. I just want my games rendered right. Borderlands 2 is a great example where I always have texture loading delay issues. And Jurassic Park has all sorts of performance issues and stuttering problems.

I've owned and/or tested a ton of cards from Voodoo, Matrox, AMD, nVidia, and so on.

I've had issues, from the sublime to the ridiculous, with almost all of them. Had a VGA clone card that somehow wanted to access memory where the WD-1007 MFM hard drive controller lived. PC would boot and run fine until you loaded an app that hit the right VGA mode to crash the HD. Yay.

More recently, I've flipped back and forth between AMD and nVidia numerous times. Currently using GTX 5X0 and 6X0 cards and generally have no issues. That said, the last driver from nVidia doubled my performance in the World Of Warplanes 3.4 beta. AA/AF settings of any kind killed the card; 20-30 fps average and single digits on occasion. Current driver, 60-90 fps, no stuttering or slowdowns.

Nice article, but hardly big news. Everyone cranks out crap on occasion - the good ones clean it up eventually.

I upgraded from a 6870 to an 7870 and was unlucky and got a sapphire (we are to cheap to buy quality caps) and after several RMA's I ended up with a gtx 660 And I was amazed that a card that has less "hw muscle" performed so much better, So I'm not surpriced that AMD have some severe driver performance problems...

"...they don't prevent the card from overheating when overclocked and overvolting..."

Perhaps I'm old-fashioned but I always assumed hardware failure was the risk you ran when you overclocked something? Granted "bursting into flames" is a pretty dramatic species of failure, but come on now.

Given that that was the driver they shipped to the tech media who were expected to overclock the cards as part of their jobs, that's still kinda fail. It's like shipping reviewers a developer version of Android that deletes everything on the phone when you try to take a screenshot.

TechPowerUp ended up with a bricked card despite no overvolting whatsoever and a very meager overclock.

That tells me TR clearly demonstrated a problem AMD may or may not have known about, so they either have shitty testing, ignored the problems or didn't have the engineering resources. None of those enthuse me.

I'm guessing the third, since AMD has been pruning it's staff recently due to their financial situation and can't really afford to spend much time on things that don't directly impact their sales. Of course, now that the issue has been brought up, it directly impacts their sales.

Honestly I have not noticed the latency issue. I have a Sapphire HD7950 OC and I have tried to see what has been pointed out, but have not seen it. Now maybe its really dependent on settings, and actual FPS being seen. I do run with vsync on so maybe its not as noticeable.

The card and drivers have actually been very stable for me. I have not had first had experience with a driver issue since Bad Company 2 came out and there was a horrible driver issue that was fixed somewhat quickly.

Hey - at least he problem may be fixable. Im on a GTX 570, some of the worst micro stutters and frame latency issues of any card, and they are not fixable by a driver update. Cut AMD some slack, Nvidia's no better.

Cut them some slack? Their competition charges nearly $25-50 dollars more for the same performance card... maybe there is something to it? Maybe not competing for the high end and sticking with the middle of the road got them where they are today. If they aren't going to try, why should I?

"...they don't prevent the card from overheating when overclocked and overvolting..."

Perhaps I'm old-fashioned but I always assumed hardware failure was the risk you ran when you overclocked something? Granted "bursting into flames" is a pretty dramatic species of failure, but come on now.

Given that that was the driver they shipped to the tech media who were expected to overclock the cards as part of their jobs, that's still kinda fail. It's like shipping reviewers a developer version of Android that deletes everything on the phone when you try to take a screenshot.

TechPowerUp ended up with a bricked card despite no overvolting whatsoever and a very meager overclock.

Well most things do have shut off temps. Your PC and video card should in theory shutdown before it causes itself damage.

Cut them some slack? Their competition charges nearly $25-50 dollars more for the same performance card?

Hmm? The 7950's price had to be slashed significantly to be on par performance/$ with the 660Ti, a technically weaker card, and they released a new BIOS that overclocks existing 7950's to keep up with the 660Ti at the same time because the drivers just fail to extract the technical potential. Let's keep it honest.

The game will work normally for a random amount of time, perhaps 20 minutes - 30 minutes. After the elapsed time, the game becomes a slideshow, the whole PC becomes incredibly sluggish. Closing the game and restarting the game results in the same issue immediately. Only upon reboot will this issue be resolved. Task manager will show almost all the physical ram in use. There will be much misc hard drive activity as well (presumably writing to the paging file).

Cut them some slack? Their competition charges nearly $25-50 dollars more for the same performance card?

Hmm? The 7950's price had to be slashed significantly to be on par performance/$ with the 660Ti, a technically weaker card, and they released a new BIOS that overclocks existing 7950's to keep up with the 660Ti at the same time because the drivers just fail to extract the technical potential. Let's keep it honest.

Uh, no. The 660 goes against the 7850 (7850 is cheaper), and the 660Ti goes against the 7870 (7870 is cheaper).

When the 7950 first came out it went against the $500 GTX580 (and beat it in most cases). After the 600 series chips got released, AMD cut their prices as the 680 was faster at that time. AMD then released the GHz edition of their chips which brought performance back in line. But still kept their prices lower than nVidia's. And this pretty much follows history from previous GPU cycles. The card maker out the door first with the new generation charges more than previous, until the other releases theirs, then things balance out.

When it comes to performance currently, it completely depends on the game. As in some games AMD wins hands down, and in others nVidia wins hands down. However they both have had driver issues, nVidia having some more severe ones which has actually bricked cards.

I'm guessing the third, since AMD has been pruning it's staff recently due to their financial situation and can't really afford to spend much time on things that don't directly impact their sales. Of course, now that the issue has been brought up, it directly impacts their sales.

Most companies don't prune their successful divisions when another is clearly floundering, so I doubt you're right. This seems more like business as usual, continually improving their product.

AMD cards have terrible open gl drivers for custom made engines such as source ports for DooM and Quake. As a user of old games, AMD still fails, after failing in these games for 15 years or more. To this end, i cannot support them. Nvidia just works out of the box on all open gl apps, such as those custom engines like gzdoom and quake 2 enhanced whereas AMD cards either give massive errors in game, or straight up will not boot into the game.

AMD cards have terrible open gl drivers for custom made engines such as source ports for DooM and Quake. As a user of old games, AMD still fails, after failing in these games for 15 years or more. To this end, i cannot support them. Nvidia just works out of the box on all open gl apps, such as those custom engines like gzdoom and quake 2 enhanced whereas AMD cards either give massive errors in game, or straight up will not boot into the game.

Cut them some slack? Their competition charges nearly $25-50 dollars more for the same performance card?

Hmm? The 7950's price had to be slashed significantly to be on par performance/$ with the 660Ti, a technically weaker card, and they released a new BIOS that overclocks existing 7950's to keep up with the 660Ti at the same time because the drivers just fail to extract the technical potential. Let's keep it honest.

I like how you're arguing with someone who's on the same side of the old fanboy farm as you. Adorable!

Cut them some slack? Their competition charges nearly $25-50 dollars more for the same performance card?

Hmm? The 7950's price had to be slashed significantly to be on par performance/$ with the 660Ti, a technically weaker card, and they released a new BIOS that overclocks existing 7950's to keep up with the 660Ti at the same time because the drivers just fail to extract the technical potential. Let's keep it honest.

I like how you're arguing with someone who's on the same side of the old fanboy farm as you. Adorable!

They are not arguing, they are discussing.

And the interesting line above: " ... AMD may or may not have known about, so they either have shitty testing, ignored the problems or didn't have the engineering resources."

Just want to point out that it's not an OR; the last one of meager resources, kinda guarantees the first two. Especially the ignore part; you only have the team you have, when prioritizing the work. And I can see the "the dev team feels we may be losing some undefined amount of performance through unknown problems in the driver that can only be verified with extensive testing" issue getting pushed down the list. Given the list of priorities they must have at AMD right now, I bet most of us would have agreed also, had we been in the meeting.

Well, until a journalist does it for free and then publicly calls them out on it. Now it's a priority.

I'm still on the fence with this one. TechReport is one of my favorite sites and I've been visiting since its origin (when it used to be tech-report, IIRC.) They're often short-fused if you disagree with them publicly, but generally they are a bunch of OK folks in my book. The site provides some great services, info, and hardware reviews. It should be on your daily short list as it is on mine.

Still, I have a genetic aversion to large bar charts portending massive differences between products when those charts are magnified way out of proportion. We all remember the infamous frame-rate bar charts in which one gpu was 2 frames per second faster than its competitor (82 fps vs. 80fps, etc.) but the 2fps difference was portrayed by a line ten times longer than the *entire* frame-rate chart line... IE, the difference was 2 fps but the bar chart at a glance made it appear to be 1000% faster...! That sort of thing.

TR is often talking about milliseconds of latency which can be "charted" but which in reality are so infinitesimally teeny tiny time slices that when you play the game you have no sense that such latencies exist, for the entirely understandable reason that the average human player simply cannot perceive events of a few milliseconds duration. Or say, for instance, an eye blink--about 150 milliseconds or so. These spans are often too short to register significantly in our perception in most cases. Enter the millisecond latency bar chart. It cannot be perceived, mostly, so we design a chart to illustrate plainly what we would be perceiving--if only we had the ability to perceive it... I sort of think that's getting awfully close to making mountains out of molehills. It's also of at least passing interest to note that often (but not always) higher latencies produce lower frame-rate averages when testing the same software, which would tend to legitimize the average frame-rate count as being the only directly perceivable event in the entire process that is worth mentioning.

The comparison videos, as I mentioned in a follow-up post on TR, left me unconvinced as it appeared to me that the angle of the camera intersecting the scenery rolling beneath it seemed different in each video--which would then make them incomparable for these purposes. IE, the "jerkier" video, because it was "filmed" at a slightly different angle than the "smoother" video does not stand as proof of lower latencies in one product but not in another.

Anyway, I've come to the personal conclusion that after 60 frames per second it's all a wash, and so I might not be the ideal person to proffer opinions on such issues... Lower latencies of milliseconds duration could theoretically be caused by a number of system factors having nothing to do with the gpu itself.

AMD driver issues have never killed cards, and certainly not two separate drivers

Killed my really nice Asus 8800GTS, and the warranty replacement was a really shitty GTS250, which was as fast or faster but had a horribly noisy fan. Your point certainly stands.

After that debacle I switched over to ATi with a 5850 and had generally good times with it, certainly a good card and I had very few actual driver problems.

But it's stuff like this that annoys me. The 7 series AMD chips are a year old, and they can't possibly have started working on a memory management issue for a year. That tells me TR clearly demonstrated a problem AMD may or may not have known about, so they either have shitty testing, ignored the problems or didn't have the engineering resources. None of those enthuse me.

On the other side of things, what soured me on AMD, after giving them a chance, was the way they would acknowledge issues on their own forum but never issue corrections. The 5770 I had suffered from both a problem with flash video causing the card to lock at the lower 2d clock rates until rebooted (which could be worked around somewhat with a custom firmware to force all card clocks to the advertised 3d rates at the cost of higher power consumption at the desktop/idle) as well as issues with audio over HDMI, traceable to a problem with how AMD handled EDID, particularly of devices not already turned on at system boot.

Both of these had numerous complaints all over their forums, and even some posts which involved fairly deep investigation into the issues, clearly pointing to drivers being the culprit. Every so often an AMD post would crop up saying "oh, well, thanks for letting us know, we'll see about a fix" and sometimes a fix would even be claimed to have been implemented, but nothing ever got better. Over the course of multiple years.

Quote:

Anyway, I've come to the personal conclusion that after 60 frames per second it's all a wash, and so I might not be the ideal person to proffer opinions on such issues... Lower latencies of milliseconds duration could theoretically be caused by a number of system factors having nothing to do with the gpu itself.

I would agree, if you meant a constant 60+ fps. Personally the most annoyed I get with any graphics card is when it's running fine 95% of the time, but drops to unplayable rates 5% of the time. Stuttering might not be quite the same as a drop to 3fps for half a minute, but having had stutter issues before I'll definitely say it's noticeable and annoying, even when it's just 150ms or so (keep in mind that 1 frame at 60 fps is ~16ms: 150 ms is the equivalent of 9 dropped frames at 60 fps). It's a lot like running video conversion (let's call it 30 fps) and having dropped frames. Even if you can't distinguish which frame exactly was dropped while the video plays, you can often distinguish a jerky quality to the video across the frame drop.

It wouldn't be a big deal (maybe) if it weren't for the fact that smooth video is one of the main reasons to buy such a high end card. Much like one of my main draws at the time to the 5770 was the built in audio over hdmi combined with multiple video outs, which turned out to be a massive inconvenience (one of the recommended "fixes" was to reboot whenever you turned on an hdmi connected tv/receiver).

When you focus on pumping out a big spec for the max of your range all to boost your average, but ignore where your min and sigma are (much less signal processing issues like long latency spikes), you sour people to your product. These things aren't, on one scale, as bad as a card going up in smoke, but just because they aren't that bad doesn't mean they don't hurt impressions (as related to expectations) or cause dissatisfaction, especially with such an expensive purchase. In that sense, a bad user experience is simply a bad experience to that particular user: subjective or even objective differences in quantifying "how bad" in comparison to another user's bad experiences rarely actually matter, and if anything can serve as a red herring. "Well your card didn't go up in smoke" does not salve "I'm pissed because I paid $xxx for something I was lead to believe would do xyz but it doesn't do it that well, and I feel that for $xxx it more than should."

In terms of UI/UX, even small signal latency issues can cause fairly definite degradation in user satisfaction. Even when you're just talking about the duration of an eyeblink or two. And expectation is absolutely key in terms of reported satisfaction with relation to that... regardless of whether you would tend to consider it realistic or not.

Quote:

TR is often talking about milliseconds of latency which can be "charted" but which in reality are so infinitesimally teeny tiny time slices that when you play the game you have no sense that such latencies exist, for the entirely understandable reason that the average human player simply cannot perceive events of a few milliseconds duration. Or say, for instance, an eye blink--about 150 milliseconds or so. These spans are often too short to register significantly in our perception in most cases.

You might want to check your science. Perceptive experience in relation to ~ twenty millisecond to hundred plus millisecond intervals has a great deal to do with what those intervals represent. When they are gaps in an otherwise smooth experience, they can definitely become noticeable--in fact, they can actually be noticed to a higher degree when the overall experience is, relative to those gaps, quite smooth. Small delays in end to end processing of action to result with a signal usually drop below noticeability (see: JND) due to an already expected environmental delay (output of action with latency from thought to action -> resulting blocking and turnaround of effect of action -> input of processing environmental stimuli, especially when working with any system where such delays are considered commonplace), but lack of congruity within a signal or between different signal sources is often much more noticeable.

Killed my really nice Asus 8800GTS, and the warranty replacement was a really shitty GTS250, which was as fast or faster but had a horribly noisy fan. Your point certainly stands.

Whoa, that was drivers? I had no idea. I had an 8800GTS that failed, but it wasn't Asus, and I had to RMA it three times with them sending me 9800's that had the same problem repeatedly until I finally got a GTS250, too.

By the time I RMA'd the card it was definitely damaged as it had graphic corruption during boot time. The GTS250 worked fine for a year or two before I replaced it with an AMD 6950 I picked up which has worked fine since I got it.