iN fAiRy dUsT wE tRuSt

Category Archives: Quality

This is the first article in a series of blog posts on Mir’s and XMir’s performance. The idea is to provide further insights into the overall performance work, point out existing bottlenecks and how the team is addressing them.

Our overall goal for Mir and XMir is to provide an absolutely fluid user experience, both in the case of typical desktop usage as well as in the case of more demanding usage scenarios like 3D gaming. More to this, our efforts to provide a fluent user-experience on the desktop should at most have a minimal impact on overall 3D application performance.

During the last weeks and months, a lot of people have raised the question if and to what degree the introduction of a system-level compositor impacts graphical performance. The short answer is: Yes, any additional layer between the GPU and the actual rendering process has an impact on the overall performance characteristics of the system. However, there are ways to avoid most of the overhead and this blog post is the not-so-short answer to the initial question.

As its name implies, a compositor is responsible for taking multiple buffer streams or surfaces and assembling (a.k.a. compositing) a final image that is then scanned out to the connected monitors. In the general case, composition requires buffering of the final image and it requires GPU resources to render the individual surfaces to the destination buffer in preparation for scanout. Here, the destination buffer is the framebuffer. The overhead of a system-level compositor can be summarized as this additional rendering step in the overall graphic pipeline, for the obvious benefit of being able to control the final output and enabling flicker-free boot, shutdown, resume, suspend and session-switching.

Both internally and externally, people have been measuring the overall performance impact with XMir as available from the archive today. Roughly speaking, people have been reporting a performance impact of ~20% in the Phoronix test suite and the question becomes: How can we significantly decrease the impact in the specific case of XMir while still keeping all the aforementioned benefits in place? The underlying idea to solve the issue is straightforward: If the compositor is clever enough, it could recognize situations where an opaque client surface does cover a complete output (XMir matches exactly this configuration). In that case, composition can be avoided and the client should be provided with a framebuffer as rendering target instead of the usual graphic memory buffer. Moreover, the server-side composition strategy can be smart, and completely skip the final composition step and scan out the framebuffer as soon as the client signals “done”. Luckily, Mir’s composition engine and associated buffer allocation/swapping infrastructure allows for implementing this behavior easily and transparently to the client. The respective implementation has been living in https://code.launchpad.net/~vanvugt/mir/bypass for some time now, and we have been testing it in parallel to trunk. Our primary test and benchmarking platform was Intel, and we haven’t seen any issues with the patch on that platform. There is a graphical glitch present on ATI cards that we are actively working on. Nouveau gives us some headache as it is quite slow both on X and XMir right now. However, we are confident that we won’t see any major issues in XMir once the underlying cause in the Nouveau driver is fixed.

Results

Measuring graphical performance and developing meaningful benchmarks is a complex task on its own. Luckily, we have some pretty capable tooling available in the opensource world. During development and evaluation of the bypass feature, we have been relying on selected test-cases of Phoronix Test Suite and on glmark2 to continuously evaluate performance gains and overall impact. We are going to publish the results across Intel, NVIDIA and AMD GPUs as part of our regular QA reporting at http://reports.qa.ubuntu.com/graphics/ as soon as we hit trunk. In summary, we are able to reduce XMir’s total overhead to ~6% on Nexuiz and OpenArena (see section “Conclusions and Future Work” for reasons for and approaches to further reduce the remaining overhead). Please also note that we are actively investigating into the results for the “QGears2: OpenGL + Image scaling” test case:

GUI Toolkits – Intel 2500

GUI Toolkits – Intel 3000

GUI Toolkits – Intel 4000

Nexuiz HDR Off

Nexuiz HDR On

OpenArena

GLMark2 numbers are not yet reported via the public dashboard but we are actively working on wiring them up as part of our daily quality efforts, too. However, the numbers are quite promising as can be seen from this preview (Lenovo x220, i7 vPro, Intel(R) HD Graphics 3000):

Conclusions & Future Work

Today, we are landing an important GPU-bound optimization for the XMir use-case with the bypass feature and we see significant performance improvements in our benchmarking scenarios. Everyday users will hardly notice any difference in graphical performance, but notice a decrease in power usage on laptops due to the system-compositor requiring less GPU and CPU cycles to carry out its tasks.

However, this is only the first step and we still see some overhead in the benchmarks. Our GLMark2 benchmark numbers for raw Mir when compared to X as in Saucy today suggest that we still have GPU-bound optimization potential that we should leverage in the XMir case. The unity-system-compositor performance is not the bottleneck in this specific scenario and we need to become more clever on the X side of things. In summary, we need to propagate the bypass approach further down into the X world and its clients with X/Compiz handing out the raw buffer provided by Mir to fullscreen, opaque X clients. Luckily, Compiz already knows about the notion of composite bypass, too and the remaining optimization potential lies mostly within X itself by making it more aware of the fact that it is living in a world of nested compositors now. Quite likely, though, Mir will require adjustments, too, to expose composition bypass end-to-end in the XMir scenario. Stay tuned, we will keep you posted within this series of blog posts.

[Update] Michael of Phoronix found out that some games, when run in fullscreen mode but not at native resolution, do not benefit from composition bypass. As mentioned in one of the comments, we are now starting to investigate into this sort of issues and will come back with updates once we identified the root causes. At any rate: Thanks for bringing it up, we will make sure that the respective benchmark/setup is present in our benchmarking setup, too.

SHARK is a modular C++ library for the design and optimization of adaptive systems. It provides methods for linear and nonlinear optimization, in particular evolutionary and gradient-based algorithms, kernel-based learning algorithms and neural networks, and various other machine learning techniques. SHARK serves as a toolbox to support real world applications as well as research in different domains of computational intelligence and machine learning. The sources are compatible with the following platforms: Windows, Solaris, MacOS X, and Linux.

The library has been in active development for over 10 years now and is in use by scientists all over the world. Last year, we, the core SHARK developers, decided that a rewrite of the library is necessary to support future use cases and provide a solid platform for users and contributors, alike. Our goals were simple:

Unify and simplify the library structure.

Rely on established components wherever feasible.

Documentation, documentation and again, documentation

Focus on quality.

In this post, I would like to dive a little deeper into the topic of quality and the processes that we established to ensure a constant and high level of quality. We decided to address quality both from a technical (read: testable) and from an API point of view.

In terms of API quality, we want the programming interface to be consistent, convenient to use and easy to extend. In equivalence to the user experience, we want potential developers to experience a welcoming and friendly environment. As we are a geographically distributed team of developers and scientists, we decided to go for a pre-commit code review approach implemented with the help of ReviewBoard. Despite initial concerns on behalf of the developers, the review process proved to be one of the most useful tools while rewriting the library with developers starting to like the final “Ship It” quickly.

In terms of “technical” quality, we decided to go for continous integration of all (reviewed) commits to the rewrite branch for all of our supported platforms. With the help of Jenkins and a bunch of virtual machines, we finally realized our idea of continous integration testing to prevent from regressions. Our unit test suite is implemented with the unit testing framework provided by boost. Test execution is handled by CTest. Static and dynamic analysis of the library is carried out with the help of cppcheck and valgrind, respectively. Code coverage metrics are calculated with the help of gcov. Finally, we are integrating all of the testing results in the job-specific views of our Jenkins instance, thereby providing developers a single source of information on the state of the library.