QStringLiteral explained

QStringLiteral
is a new macro introduced in Qt 5 to create QString from string literals.
(String literals are strings inside "" included in the source code).
In this blog post, I explain its inner working and implementation.

Summary

Let me start by giving a guideline on when to use it:
If you want to initialize a QString from a string literal in Qt5,
you should use:

Most of the cases: QStringLiteral("foo") if it will actually be converted to QString

QLatin1String("foo") if it is use with a function that has an overload
for QLatin1String. (such as operator==, operator+, startWith, replace, ...)

I have put this summary at the beginning for the ones that don't want to read the technical details that follow.

Read on to understand how QStringLiteral works

Reminder on how QString works

QString, as many classes in Qt, is an implicitly shared class.
Its only member is a pointer to the 'private' data.
The QStringData is allocated with malloc, and enough room is allocated after it to put the
actual string data in the same memory block.

In the first line, we call the function
QObject::setObjectName(const QString&).
There is an implicit conversion from const char* to QString, via its constructor.
A new QStringData is allocated with enough room to hold "MyObject", and then the string is
copied and converted from UTF-8 to UTF-16.

The same happens in the last line where the function
QString::replace(const QString &, const QString &) is called.
A new QStringData is allocated for "%FileName%".

Is there a way to prevent the allocation of QStringData and copy of the string?

Yes, one solution to avoid the costly creation of a temporary QString object is to have overload for
common function that takes const char* parameter.
So we have those overloads for operator==

The overloads do not need to create a new QString object for our literal and can operate directly
on the raw char*.

Encoding and QLatin1String

In Qt5, we
changed the default decoding for the char* strings to UTF-8.
But many algorithms are much slower with UTF-8 than with plain ASCII or latin1

Hence you can use QLatin1String, which is just a thin wrapper around char *
that specify the encoding. There are overloads taking QLatin1String for functions that can opperate or
the raw latin1 data directly without conversion.

The good news is that QString::replace and operator== have overloads for QLatin1String.
So that is much faster now.

In the call to setObjectName, we avoided the conversion from UTF-8, but we still have an (implicit) conversion
from QLatin1String to QString which has to allocate the QStringData on the heap.

Introducing QStringLiteral

Is it possible to avoid the allocation and copy of the string literal even for the cases like setObjectName?
Yes, that is what QStringLiteral is doing.

This macro will try to generate the QStringData at compile time with all the field initialized.
It will even be located in the .rodata section, so it can be shared between processes.

We need two languages feature to do that:

The possibility to generate UTF-16 at compile time:
On Windows we can use the wide char L"String".
On Unix we are using the new C++11 Unicode literal: u"String".
(Supported by GCC 4.4 and clang.)

The ability to create static data from expressions.
We want to be able to put QStringLiteral everywhere in the code.
One way to do that is to put a static QStringData inside a C++11 lambda expression.
(Supported by MSVC 2010 and GCC 4.5)
(And we also make use of the GCC
__extension__ ({ })Update: The support for the GCC extension was removed before the beta because it does not work
in every context lambas are working, such as in default functions arguments)

Implementation

We will need need a POD structure that contains both the QStringData and the actual string.
Its structure will depend on the method we use to generate UTF-16.

The code bellow was extracted from qstring.h,
with added comments and edited for readability.

The reference count is initialized to -1.
A negative value is never incremented or decremented
because we are in read only data.

One can see why it is so important to have an offset (qptrdiff) rather than
a pointer to the string (ushort*) as it was in Qt4.
It is indeed impossible to put pointer in the read only section because
pointers might need to be relocated
at load time.
That means that each time an application or library,
the OS needs to re-write all the pointers addresses using the relocation table.

Results

For fun, we can look at the assembly generated for a very simple call to QStringLiteral. We can see that
there is almost no code, and how the data is laid out in the .rodata section

We notice the overhead in the binary. The string takes twice as much memory since it is encoded in UTF-16,
and there is also a header of sizeof(QStringData) = 24.
This memory overhead is the reason why it still makes sense to still use QLatin1String
when the function you are calling has an overload for it.

Conclusion

I hope that now that you have read this you will have a better understanding on where to use and not to use
QStringLiteral.
There is another macro QByteArrayLiteral, which work exactly on the same principle but creates a QByteArray.