Monthly Archives: July 2012

A few years ago, memory management for the graphics card was performed as single static allocation in the X server which was then carved up into surfaces and used for the scanout and important pixmaps such as renderbuffers for its DRI clients. The upside of this simple scheme meant that all locations were known, allocation was very fast and we could always tell the GPU where its surfaces were. The downside was that the amount of video memory was therefore predetermined and could not be resized, not even if you added a second monitor and needed a new framebuffer, or if you were running a game and it wanted lots of textures. So X was relieved of its role in memory management and the task given to the kernel under the guise Graphics Execution Memory. Now userspace has no idea where its surfaces are and so needs to ask the kernel to patch up its command buffers to insert the correct addresses. This relocation of command buffers is a bottleneck in the new design, and from the outset people were complaining about the performance loss going from XAA to GEM/UXA.

A few years have passed and we’ve been gradually tuning all the parties, but the question remains have we managed to recover that speed which we threw away so long ago?

I compiled Xorg-1.5 and xf86-video-intel-2.6 for my 965gm (a ThinkPad t61) by discarding anything that no longer compiled and was left with a very light shell for investigating XAA in its heyday. I then ran x11perf under the ancient XAA and EXA, and the modern UXA and SNA to see how things had changed.

By looking at the geometric mean of the all the x11perf tests we can get a rough feel for the overall performance change against XAA:

EXA 1.117 (12% faster than XAA)

UXA -1.108 (11% slower than XAA)

SNA 2.284 (128% faster than XAA)

As with all averages of micro-benchmarks take this with an extremely large pinch of salt.

So Michal Danzer just pushed some patches to enable using glamor from within the xf86-video-ati driver. As a quick recap, glamor is a generic Xorg driver library that translates the 2D requests used to render X into OpenGL commands. The argument being then that the driver teams need only concentrate on bringing up the OpenGL stack and gain a functioning display server in the process. The counter argument is that this compromise in saving engineering time penalises the performance of the display server.

To highlight that last point, we can look at the performance of the intel driver with and without glamor, and rendering directly with OpenGL:

The centre baseline is the performance of simply using the CPU and pixman to render, above that we are faster and below slower. The first bar is the performance of using OpenGL directly, in theory this should provide the best performance of all, only being limited by hardware. Sadly, the graph shows the stark reality that undermines using glamor – one needs an OpenGL driver that has been optimized for 2D usage in order to maximise GPU performance with the Xorg workload. Note the areas where glamor does better than the direct usage in cairo-gl? This is where glamor itself attempts to mitigate against poor buffer managment in the driver.

Enter a different GPU and a different driver. The whole balance of CPU to GPU power shifts along with the engineering focus. Everything changes.

Taking a look at the same workloads on the same computer, but using the discrete Radeon HD5770 rather than the integrated processor graphics:

Perhaps the first thing that we notice is the raw power of the discrete graphics as exposed by using OpenGL directly from within cairo. Secondly, we notice the lack luster performance of the existing EXA driver for the Radeon chipset – remember everything below the lines implies that the GPU driver in Xorg is behaving worse than could be achieved just through client-side sowftware rendering, that using RENDER acceleration is nothing of the sort. And then our attention turns to the newcomer, glamor on radeon. It is still notably much slower than both the CPU and using OpenGL directly. However, it is of very similar performance to the existing EXA driver, sometimes slower, sometimes faster (if you look at the relative x11perf, then it reveals some areas where the EXA driver could do major improvements).

Not bad for the first patch with an immature library, and demonstrates that glamor can be used to reduce the development cost of bringing up a new chipset – yet does not reach the full potential of the system. Judging by the last graph, one does wonder whether glamor is even preferable to using xf86-video-modesetting in such cases, on a high performance multicore system, for the time being, at least.😉