QML vs. C++ for application startup time

After 5.8 reduced startup times with its QML compilation caching, object creation remains one of the largest startup time consumer for Qt Quick applications. One can use Loaders to avoid loading parts of the application that are not currently visible, but some scenes with higher complexity can still be taking quite some time to initialize when they urgently need to be displayed, especially on embedded systems.

At QtCon 2016 Andrew Knight presented a few benchmarks showing a situations where C++ can improve Qt Quick object creation times over QML. A conclusion of the session was that those benchmark results would need to be taken with a grain of salt since the specific benefits could be smaller in a real application relatively to everything else happening around.

What we’re presenting today are results of converting parts of a well-known QML example to C++ using public and private APIs, measuring the improvements on loading and execution times to have a better estimate of the opportunity for real world applications.

Preparation

We’re going to use the samegame QML example as reference. This gives us a complete application handling events and producing visual outputs. Like many QML applications reaching initial maturity, focus hasn’t been put on performance optimizations in samegame yet.

Let's start by profiling the application in order to fix the low hanging fruits first. This involves the QML Profiler, but also a regular C++ profiler (like macOS Instruments or Linux perf) together with a "-release -force-debug-info" configured Qt (don't pass -developer-build since it changes the way QML caching behaves). This leads us to remove Qt Quick Particles, QQmlFileSelector, and we’ll also convert property behaviors and states into imperative animations. These changes already considerably improve the performances by reducing the number of QObjects in the scene and by preventing the use of a few less scalable features of Qt Quick.

Most importantly for now, this allows us to focus performance measurements on application code, making sure that both our QML and C++ implementations use the same Qt Quick code paths and total number of QObjects.

Benchmarks

Samegame will be launched in fullscreen when using the eglfs platform plugin, this means that the game area will fill the screen, and on a 1080p display will give us a grid of 60x31 game blocks. We use a Nitrogen6X quad core i.MX6 for this project, running Yocto 2.2 and a standard cross-compiled build of Qt 5.9.1. Running those benchmarks on desktop gave us times around 20x faster, but the relative improvements were similar so we'll focus on i.MX6 results.

Two actions in the application will be uses as reference for our benchmarks:

Starting a new game, which uses a thin JavaScript loop to fill the game area with created Block objects.

Clicking at a fixed position inside the game area, which is a heavier JavaScript loop that updates the position of all blocks on the right of the clicked column (since the game has been hardcoded to fill each column with the same color, clicking removes one column and move following columns to the left)

The StartGameBenchmark measures creating this scene, the HandleClickBenchmark measures clicking on the left-most row and iterating over all blocks to move them left.

As above, but doing direct C++ method calls rather than going through meta-objects:

// C++
class Block : public QQuickItem { void fadeIn() { /*...*/ } };

// C++
block->fadeIn();

Benchmark 1: Starting a new game

Our implementation is using 5 QObjects per block, 2 of which are QQuickItems, for a total of 9300 QObjects. Creating a screen full of blocks for a new game takes 1.153 seconds on our i.MX6 (definitely user-noticeable), but we can reduce it to 0.388 seconds if we create the same item and object structure in C++ (197% faster). The largest gain comes from reimplementing Block.qml in C++ which alone gives us a speed improvement of 124%, mostly because we avoid the evaluation of QML properties and the creation of property bindings during instantiation.

Benchmark 2: Handling a click

This benchmark iterates over the blocks that need to be moved, starts an animation on the x and y property of each block’s root item, and update the blocks board position in the index array. Our C++ implementation is 120% faster but since this action doesn’t create any new QML object, the largest gain comes from reimplementing the JavaScript for-loop in C++, which alone is 57% faster.

The QML JIT implementation usually competes acceptably well with C++ and one thing to note is that the total time for handling this click in our case also has a considerably smaller performance footprint on our application than creation times covered by the Start Game benchmark. So while relatively put this is faster, the user will not notice the 117 ms reduction for Handle Click as much as the 764 ms reduction of the Start Game case.

Why is C++ faster?

Qt Quick and QML are very efficient when it comes to update a scene frame after frame to produce a smooth animation. Properties are consistently and efficiently updated according to animations by property bindings, layout will involve only the necessary elements and the scene graph will reuse as many OpenGL objects as possible to render the new frame, most of the time, with optimal state changes in system and graphics memory.

However, having this structure in place requires some runtime initialization to be done at creation time, and a mostly-dynamically-typed language like QML can’t really compete in this sport with a language like C++, whose design makes performance and compile-time optimizations a top-priority over its other language characteristics.
On top of having to create the same backing C++ QObject and QQuickItem, QML must resolve dynamic types, create data-structures to represent property bindings and go through the Qt meta-object type system to access any native member. Almost all of which C++ performs equivalently at compile time.

One of the reasons why QML can choose to offer conciseness and development speed at the cost of some performance is because it’s also designed to interface well with C++ code. This is an opportunity that, for example, Qt Quick Controls 2 has taken to reduce compilation and creation times with gains comparable to what is presented here. Qt Quick Controls 2 has an advantage over applications though: it’s part of Qt and it has access to private symbols like QQuickText, QQuickRectangle, QQuickFlickable, and even QQuickItemPrivate.

While our C++ implementation is tailored for measuring the potential performance gain of those two specific benchmarks, an ideal method would be if applications could do layout in QML, but handle events and state changes from C++ where the performance gain would be valuable. This could take advantage of strengths of both languages while reducing the number of dynamically-typed property bindings and JavaScript blocks to perform at runtime.
There is still some work to be done in Qt Quick to be able to conveniently harness C++ in UI code. We've already been able to mix the two languages using some hacks for this exercise, but the real blocker at the moment is that we need to be able to create Qt Quick visual primitives like animations, images and rectangles from C++ just like from QML.

Give up binary compatibility?

For this experiment, we had to expose private API that isn’t dynamically exported (i.e. QQuickSpringAnimation).
qmake makes it relatively easy to access exported private symbols.
This comes at a hefty price: those APIs are not supported and source and binary compatibility are not guaranteed.
An application could theoretically use private APIs without major problems if they are recompiled for any update of Qt.
Changes to Qt private API would otherwise likely result in application crashes.

The question is then: should more of the Qt Quick API be publically exposed to C++ application code?

Conclusion

While we demonstrate how using some C++ private Qt Quick APIs can allow you to improve the creation times today, this is unsupported it’s not something that we recommend.

It should be clear that even though C++ is faster in this case, declarative QML will usually remain a better choice for core UI code because of its flexibility and conciseness. QML is also getting faster every release, this gap will continue shrinking up to a point. As for non-UI business logic, C++ can already be used comfortably without requiring using private Qt Quick APIs.

Long creation times are also not a problem if it can happen while the user is not waiting, this can be accomplished with asynchronous Loaders or by pre-loading components.

To conclude we are asking the community: Are those loading time improvements a great enough opportunity to justify improving the C++ brigde for UI code and extending the public Qt Quick API to classes like animations, QQuickRectangle and QQuickImage? A set of base classes that would give applications a freedom similar to what Qt Quick Controls 2 has.

Contribute to the discussion by letting us know in social medias or comments below if you have experience using C++ for Qt Quick UI code, and/or could benefit from having those functionalities publicly available.

Woboq is a software company that specializes in development and consulting around Qt and C++. Hire us!