Monday, May 12, 2014

The Truth on OpenGL Driver Quality

The driver landscape is something that any practicing GL dev must face unless you like having only a fraction of potential customers able to enjoy your product. (These are the drivers you'll have to work with in order to actually ship a product today or within the next year or so. If you're just a dev playing at home with one driver you'll probably not have to deal with any of this gritty real-world stuff.)

If all you've ever done is use D3D then you better strap yourself in because the available GL drivers for Windows/Linux are all over the map. Here's my current opinion on driver quality:

Vendor A

What most devs use because this vendor has the most capable GL devs in the industry and the best testing process. It's the "standard" driver, it's pretty fast, and when given the choice this vendor's driver devs choose sanity (to make things work) vs. absolute GL spec purity. Devs playing at home use this driver because it has the sexiest, most fun to play with extensions and GL support. Most of what you hear about the amazing things GL will be able to do in order to compete against D3D12/Mantle are by devs playing with this driver. Unfortunately, we can't just target this driver or we miss out on large amounts of market share.

Even so, until Source1 was ported to Linux and Valve devs totally held the hands of this driver's devs they couldn't even update a buffer (via a Map or BufferSubData) the D3D9/11-style way without it constantly stalling the pipeline. We're talking "driver perf 101" stuff here, so it's not without its historical faults. Also, when you hit a bug in this driver it tends to just fall flat on its face and either crash the GPU or (on Windows) TDR your system. Still, it's a very reliable/solid driver.

Vendor A supports a zillion extensions (some of them quite state of the art) that more or less work, but as soon as you start to use some of the most important ones you're off the driver's safe path and in a no man's land of crashing systems or TDR'ing at the slightest hickup.

This vendor's tools historically completely suck, or only work for some period of time and then stop working, or only work if you beg the tools team for direct assistance. They have enormous, perhaps Dilbert-esque tools teams that do who knows what. Of course, these tools only work (when they do work) on their driver.

This vendor is extremely savvy and strategic about embedding its devs directly into key game teams to make things happen. This is a double edged sword, because these devs will refuse to debug issues on other vendor's drivers, and they view GL only through the lens of how it's implemented by their driver. These embedded devs will purposely do things that they know are performant on their driver, with no idea how these things impact other drivers.

Historically, this vendor will do things like internally replace entire shaders for key titles to make them perform better (sometimes much better). Most drivers probably do stuff like this occasionally, but this vendor will stop at nothing for performance. What does this mean to the PC game industry or graphics devs? It means you, as "Joe Graphics Developer", have little chance of achieving the same technical feats in your title (even if you use the exact same algorithms!) because you don't have an embedded vendor driver engineer working specifically on your title making sure the driver does exactly the right thing (using low-level optimized shaders) when your specific game or engine is running. It also means that, historically, some of the PC graphics legends you know about aren't quite as smart or capable as history paints them to be, because they had a lot of help.

Vendor A is also jokingly known as the "Graphics Mafia". Be very careful if a dev from Vendor A gets embedded into your team. These guys are serious business.

Vendor B

A complete hodgepodge, inconsistent performance, very buggy, inconsistent regression testing, dysfunctional driver threading that is completely outside of the dev's official control. Unfortunately this vendor's GPU is pretty much standard and is quite capable hardware wise, so you can't ignore these guys even though as an organization they are idiots with software. Basic stuff like glTexStorage() crashes (on a shipped title) for months on end with this driver. B's driver devs try to follow the spec more closely than Vendor A, but in the end this tends to do them no good because most devs just use Vendor A's driver for development and when things don't work on Vendor B they blame the vendor, not the state of GL itself.

Vendor B driver's key extensions just don't work. They are play or paper extensions, put in there to pad resumes and show progress to managers. Major GL developers never use these extensions because they don't work. But they sound good on paper and show progress. Vendor B's extensions are a perfect demonstration of why GL extensions suck in practice.

This vendor can't get key stuff like queries or syncs to work reliably. So any extension that relies on syncs for CPU/GPU synchronization aren't workable. The driver devs remaining at this vendor pine to work at Vendor A.

Vendor B can't update its driver without breaking something. They will send you updates or hotfixes that fix one thing but break two other things. If you single step into one of this driver's entrypoints you'll notice layers upon layers of cruft tacked on over the years by devs who are no longer at the company. Nobody remaining at vendor B understands these barnacle-like software layers enough to safely change them.

I've occasionally seen bizarre things happen on Vendor B's driver when replaying GL call streams of shipped titles into this driver using voglreplay. The game itself will work fine, but when the GL callstream is replayed we'll see massive framebuffer corruption (that goes away if we flush the GL pipeline after every draw). My guess: this driver is probably using app profiles to just turn off entire features that are just too buggy.

Interestingly, Vendor B has a tiny tools team that actually makes some pretty useful debugging tools that actually work much of the time - as long as you are using vendor B's GPU. Without Vendor B's tools togl and Source1 Linux would have taken much longer to ship.

This could be a temporary development, but Vendor B's driver seems to be on a downward trend on the reliability axis. (Yes, it can get worse!)

On the bright side, and believe it or not, Vendor B knows the OpenGL spec inside and out - to the syllable. If you can get them to assist you, their advice is more or less reasonable about plain GL matters (not extensions).

Vendor C - Driver #1

It's hard to ever genuinely get angry at Vendor C. They don't really want to do graphics, it's really just a distraction from their historically core business, but the trend is to integrate everything onto one die and they have plenty of die space to spare. They are masters at hardware, but at software they aren't all that interested really. They are the leaders in the open source graphics driver space, and their hardware specs are almost completely public. These folks actually have so much money and their org charts are so deep and wide they can afford two entirely different driver teams! (That's right - for this vendor, on one platform you get GL driver #1, and another you get GL driver #2, and they are completely different codebases and teams.)

Anyhow, this vendor's HR team is smart: it directly hires open source wiz kids to keep driver #1 plodding forward. This driver is the least advanced of the major drivers, but it more or less works as long as you don't understand or care what "FPS" means. If it doesn't work and you're really motivated you can git your hands dirty and try to fix it and submit a patch. If you're really good at fixing this driver and submitting patches then you may get a job offer from this vendor.

Anyhow, driver #1 is unfortunately pretty far behind on the GL standard, but maybe in 1-2 years they'll catch up and implement the spec as of last year. But you can't ignore this driver because they have a significant and strategically growing market share. So as a developer who wants to reach this market, you can't afford to use those fancy extensions or the latest trendy "modern" GL supported by vendors A and B. You must do a min() operation across all the drivers and in many cases this driver gates what you can do.

Vendor C has no GL tools at all for either platform. Sorry - want to debug that graphics problem you're having? Welcome to 1999.

Vendor C - Driver #2

A complete disaster. This team's driver is barely used by any titles because GL on this platform is totally a second class citizen, so many codepaths in there just don't work. They can't update a buffer without massive, random corruption. This team will do stuff like give you a different, unique, buggy driver drop for every title in your back catalog for perf analysis or testing. This team will honestly ask you if "perf" or "correctness" is more important.

I've seen one well-known engine team spend over a year attempting to get their latest GL 4.x+trendy extensions backend working at all on this team's driver. Hey guys - this driver just doesn't work, just move on already and implement a plain GL 3.x backend with workarounds (just like togl and other shipping titles do today).

On the bright side, Vendor C feeds this driver team more internal information about their hardware than the other team. So it tends to be a few percent faster than driver #1 on the same title/hardware - when it works at all.

Other drivers:

In addition to the above major drivers, there are several open source drivers, mostly developed by the community, for hardware from vendors A and B. They tend to be behind the times from a GL perspective, but I hear they mostly work. I don't have any real experience or hard data with these drivers, because I've been fearful that working with these open source/reverse engineered drivers would have pissed off each vendor's closed source teams so much that they wouldn't help.

Vendor A hates these drivers because they are deeply entrenched in the current way things are done. These devs have things like mortgages and college funds (or whatever) to keep funding, so there's a massive amount of inertia from this camp. There's no way they are going to release their Top Secret GPU Specs to the public, or (gasp!) open source their driver. Vendor A will have to jump on the open source driver bandwagon soon in order to better compete against Vendor C's open model, whether they like it or not.

Vendor B halfheartedly helps their open source driver by funding a tiny team to keep the thing working. At some point, the open source driver for Vendor B's GPU may be a more viable path forward then their half-functional closed source driver.

Conclusion

To ship a major GL title you'll need to test your code on each driver and work around all the problems. May the "GL Gods" help you if you experience random GPU corruption, heap corruption, lockups, or TDR's. Be very nice to the driver teams and their managers/execs, because without them your chances aren't nearly as good.

81 comments:

Vendor B is the one that irks me the worst. They have absolutely no way to file driver bug reports--you post it to their forum, and hope (pray) it doesn't collect too much dust. At least vendor B's drivers are better than they were 5+ years ago...

Thank you for making this post. I've had the exact same experience with all three vendors, and it is comforting to know that I'm not alone.

Vendor B open source driver (independent!) developer created a patch for me in IIRC ~15 minutes to workaround hardware(sic!) bug. The same bug was visible in the proprietary driver, they fixed it... in 3 months.

Vendor B's opinion towards Linux is a bit mixed though.. since they dropped support for their still quiet broadly used older GPU series, they basically left all those users in the dark with their closed drivers (no updates at all for like 2 years). The open source driver for the same GPUs however, are the probably the best ones out there, apart from maybe the #1 drivers from Vendor C

Or what vendor D does on the desktop... They don't only provide mobile drivers, they build the driver front-ends on the desktop as well - for hardware from vendor A, B and C! So when you think it's bad what A, B and C do with there drivers, wait until you see a third party (which isn't really into the graphics business) does with that hardware...On the supported GL version side they play catch-up and what they ship has a long list of known bugs. You have to wait very long for fixes and even longer for feedback on bug reports (if you get any feedback at all).

This being said: buggy drivers are one thing, with more real-world applications using OpenGL those bugs will get detected even quicker. What really bugs me if I have no sane way of filing a bug report and don't get a reply on the reported bugs :-(

D's drivers are OK for our purposes on a system with a yellow-animal prime-number version and newer, but not for the still very popular version before that, the one with black-and-white animal. And there is no remedy for the user except upgrade the system, a truly dreaded step, because this WILL break some other software! And it yet terribly lowers the bar for "lowest common denominator" and still takes screenfuls upon screenfuls of hacky workarounds of its own needed on no other system.

Vendor A's drivers on Windows, Linux/BSD/Solaris share large parts of their codebase, so I'm pretty positive that it's one single driver team.

Oh, I know another tidbit about Vendor A: In the late 1990ies a certain German hardware manufacturer, lets call them E very successfully built and sold graphics cards based on Vendor A's GPUs. However the drivers shipping with those graphics cards were not the original Vendor A drivers, but drivers in which E did intense profiling and reverse engineering and patched them with their own fixes. Eventually E went bankrupt, but before that happened a lot of E's software development team moved away to work at A.

dude, remove that. Any dumbass can see it, it's intentional. I'm sure the quality of discussion can only be better when you don't name the Ones That Should Not Be Named... it calls their attention... bad juju...

I'm going to play devil's advocate for a second. Is this avoidable though? Can one balance "cutting edge" whilst maintaining stability across the board? Also what would be the cost impacts to increase driver support to smaller players? (would something like a community support stack-overflow-type work?).

Vendor B FLOSS driver team is good showcase of how it's VERY beneficial to have open DEVELOPMENT (and not only discussion) of GPU driver.

1) Source code is available (debugging into kernel right through gpu driver!)2) Docs are available3) People who know how the beast works are available on IRC, mailing list4) Everybody can get involved without any NDAs or other fishy stuff5) Reaction time is quick and may even come from different vendor (Cause on that platform there is LARGE front end shared between Vendor/Community FLOSS drivers), or from independent devs.6) OpenGL-in-all-but-trademarked-name Conformance Test Suite is publicly available to all, and everybody can contribute. (And in turn, drivers validated by it successfully got OpenGL ES 2.0/3.0 certificates)

We are talking about drivers stuff here, not Learn-OpenGL/101-of-OpenGL/How-to-do-that-fancy-tehnique discussion, so Stack Overflow would be of no direct use to drivers devs. Hence would not improve much state of OpenGL drivers / specs.

I couldnt agree more. I have one of the vendor B's product and their open-source driver works better in my experience with most titles.Its not too late to throw that shitty closed one and use opensource one as a base.

I remember years ago buying a card from Vendor B because they'd just announced they were open-sourcing their drivers... but then year after year the open source driver never really came to do much and the closed source one also sucked...

When Steam came out for Linux (actually, it was originally to play super meat boy), I bought a card from Vendor A and haven't looked back.

Vendor A is just the best user experience for gaming on Linux. Still get tearing though, even with v-sync and overclocking. UGH

Never had tearing with Vendor A on the desktop or in games as long as I didn't use a hobbiest desktop (everything but KDE, Gnome or Unity). With vendor As open source driver which isn't developed by people from Vendor A I get tearing in Gnome, but not when running Gnome 3.12 on Wayland. So X11 might also play a role.

Vendor B's open source driver is essentially all the good parts of Vendor C's driver #1 and Vendor B's closed source driver -- it's already eclipsing Vendor B's closed source driver in a games (even Source games). The trend seems to be that open source seems to be the best way to develop graphics drivers and software.

Can't AMD just take a team of superhackers and rewrite hole thing in one go? This keep giving them very bad rep despite some goodwill. I know it costs money but now when SteamOS will take off in some form or shape, they are in clear danger to be left outside. Maybe they not care with all console sales, but heh....

The primary issue with that is AMD doesn't have much money to hire and educate new programmers. Rather than redevelop an entirely new driver from scratch though, it's better if they just invest more hands to working on the open source drivers and helping out the Mesa team get OpenGL compliance -- maybe transition some of the other driver developers to improving the open source drivers.

Rewriting from scratch is not as simple as it sounds, an OpenGL driver is incredibly complex piece of software and that's the main reason why most of them suck. AMD actually tried this approach in the past, read this post for more information http://www.humus.name/index.php?ID=351

Yep, the pb is GL itself.I'm working on and off on a personal open source radeon (SI for the moment) driver for linux. Bare-metal programming is really easy once the hw is properly inited. GL implementation complexity is *disgusting* when you see how easy bare-metal programming is.Khronos must reboot GL for GL5... *for good*-->99% of the API is useless/performange hog regarding modern GPUs.

what about apitrace? tools from Vendor C team one, they certainly contributed to it. we don't want more private vendor tools, we want standard tools that work on every vendors driver. Though I know apitrace wasn't good enough for Valve which just did their own thing anyways. Duplicated effort even in open tools failure.

The interesting part here is that Vendor B recently announced that they will replace their closed-source Linux kernel layer with the open-source counterpart. Ergo, the open-source driver will become the default, with a closed-source "addon" for additional performance.

This just might fix the single biggest hurdle when porting games to Linux.

Of course, this analysis is missing Vendor D and their popular line of feline-themed operating systems. Their drivers are a totally different beast, with their own sets of bugs. They only gained 4.1 support in late 2013 after all...

Yes, but the bad thing is that the closed source addon is the entire OpenGL implementation. Using the open source kernel module doesn't help anything, expect makes it easier to install them since you don't have to compile the module against your local kernel headers. So it doesn't require the system to have a working compiler toolchain and it doesn't matter anymore if you install them on Ubuntu which has older drivers or some bleeding edge distro like Arch Linux. Vendor A and Bs closed source drivers already have open source kernel modules. Vendor B will simply try to use the one that is curated in the mainline kernel tree and therefore automatically works with whatever version. Has nothing to do with how the driver will do 3D.

Replacing the shim between the the kernel and closed source driver won't change anything. It'll just make it a bit easier to install the closed source driver. The open source driver is already the default.

So here you have the exact reason why free software drivers are much better.It's a pitty that Vendor A is so free software unfriendly and Vendor B doesn't try harder on it.Most GL problems described in a previous post would go away.

And of course if you aren't a game developer, you can't even use D3D drivers widely since it doesn't work over remote desktop and in some cheap virtual environments. It's 2014, even your old netbook that can't turn on anymore has a capable GPU, and you still have to worry about the no GPU case.

Don't forget Vendor D repackages drivers from everyone else, adds their own custom tweaks, does little or no testing, and only issues real updates about once a year. Despite their own heavily GPU dependent code, it almost seems like they think drivers are just an item to check off a list and not something that need to be maintained, debugged and tuned.

Yeah, the GPU driver world is still a mess - more so if you are using them for something that isn't a game.

Vendor P does not support WGL/GLX_ARB_create_context or EXT_geometry_shader4, which means they are limited to OpenGL 2.1 with functionality equivalent to GPUs from ca. 2003. Also good luck installing their drivers on anything other than Windows or Ubuntu 12.04.

There are two Vendor V's. One of them has pretty terrible 3d acceleration support in general; the other I have little experience with.

Disclaimer: the following concerns Linux drivers only. I have no experience with the respective Windows drivers of these vendors.

There are two "Vendor V"s, both have open-source drivers.

The vendor V with mostly open-source hypervisor has a pretty messy driver that mostly works as of late, but has a long history of crashing the kernel, causing memory corruption in totally random places, having unpleasant artifacts, and overall it's not something a sane person would like to have loaded in their kernel.

The vendor V with a proprietary hypervisor has the best VM driver around. Not only it is open source, this vendor actually employs a team of excellent graphics developers that has made tremendous contributions to the open-source driver landscape in general. Vendor A's and Vendor B's open-source drivers rely on this work heavily and without it they'd be much, much worse off. In fact, an important library (or even framework?) written by these guys is used in every open-source driver there is! (except vendor C's, who made their driver long before the library was created).

I don't have much experience with Vendor P's guest driver, but it's proprietary and it totally freaked out the one time I tried to run a compositing window manager on it.

What about all the ARM stuff? It'd be nice if you only had to support 3 drivers, but with mobile there's like 5 more. I guess it's good that GL is the standard there (even windows phone 8.1 has a wrapper now), but still, fragmentation BS hits us all the time

I use Vendor A and C's products and for the most part, I've never experienced any issues with Vendor A's products on Linux.

As for Vendor C, I had a bug report files in December which meant I couldn't use Linux at all on a specific piece of hardware. This bug was fixed 6 months later and still hasn't gone into the mainline kernel. Don't even ask about other attempts to get the hardware to work. Their only saving grace is that their hardware video decoding on Linux is up to scratch in majority of cases.

Bs Linux driver is so terrible it makes me cry. It's made even worse by the fact that it doesn't support any GPUs more than a year or two old. They're so ruthless about deprecating support it's not funny.

It is sooooooo much worse in Android world. Literally zillions of HW and driver combinations, very brief support for driver updates. It truly is horrible. Here is an exercise: name all the different GPU IP's found in Android, then name the variants used in different SoC's and then name what drivers are used for different devices using the same SoC.

One thing to note, although I don't know if it changes things that much, but both vendor A and B appear to have separate code bases (or driver packages) for their OpenGL support for consumer (Ge-Wiz branded) cards and their professional cards (IcePoof and Quintic).

Last I checked vendor A's professional card with their closed source drivers had about 80% of hardware features supported under Linux (I used them for 2D not OpenGL, but my favorite was OpenGL acceleration only on one display), while vendor B was the same train wreck under Linux for both their consumer and professional cards.

No, it's extremely consistent. It also uses the same Microsoft front-end for all drivers, while with OpenGL most driver issues are front-end issues and every driver is its own front-end on Windows and Linux. On OS X on the other hand, all OpenGL drivers share the common front-end, but it's a TERRIBLE, TERRIBLE one.

Here's my take on it. I'm not a coder (so I can't fix stuff) and I've got one of Vendor B's older cards. I got really annoyed when Vendor B deprecated my card, which dates from around 2008. I don't have much money, but when I splashed out and bought a more recent Vendor B card, I found out to my extreme disappointment that it won't work on my older NAPA-based motherboard. Hence, I'm also stuck with a machine upgrade in the near future. Until then, I'm stuck with low fps on games because I can no longer use Vendor B's proprietary driver, which gave me significant boosts on the few graphics game titles I play.

Yes, the open source Linux driver works. It's not fast, but it works well enough to show me pictures on the screen. For example, Minecraft (a game I like to play) only plays properly on Vendor B's open source Linux driver, and occasionally has weird errors on Vista's driver from Vendor B. Now that could be simply Mojang's coding, or it could be an actual bug in Vendor B's Vista driver, as I'm not a coder, I can't tell.

So, TL;DR: I really do wish that Vendor B had continued supporting the older cards. Yes, I don't expect that a R7000 would keep up but heck, surely a Radeon HD 3450 could be brought somewhat up to snuff.

What's really fun is when vendor C thinks they can license graphics cores from vendor D without bothering to make sure a proper device driver is written for an operating system or graphics API that those graphics cores were never meant to be used with. I'm talking about D3D here - no idea if they even tried to make OpenGL work at all. F U vendors C and D!

About Me

Back in the day I worked for several years at Digital Illusions on things like the first shipping deferred shaded game ("Shrek" - 2001), software renderers, and game AI. Then, after working for Microsoft at Ensemble Studios for 5 years as engine lead on Halo Wars, I took a year off to create "crunch", an advanced DXTc texture compression library. I then worked 5 years at Valve, where I contributed to Portal 2, Dota 2, CS:GO, and the Linux versions of Valve's Source1 games. I was one of the original developers on the Steam Linux team, where I worked with a (somewhat enigmatic) multi-billionare on proving that OpenGL could still hold its own vs. Direct3D. I also started the vogl (Valve's OpenGL debugger) project from scratch, which I worked on for over a year. In my spare time I work on various open source lossless and texture compression projects: crunch, LZHAM, miniz, jpeg-compressor, and picojpeg.