Chromium (the browser) and DRI3

I got a note on IRC a week ago that Chromium was crashing with
DRI3.

The Google team working on Chromium eventually sent me a link to the
bug report.
That's secret Google stuff, so you won't be able to
follow the link, even though it's a bug in a free software application
when running on free software drivers.

In both cases, the recommended “fix” was to switch from DRI3 back to
DRI2. That's not exactly a great plan, given that DRI3 offers better
security between GPU-using applications, which seems like a pretty
nice thing to have when you're running random GL applications from the
web.

Chromium Sandboxing

I'm not entirely sure how it works, but Chromium creates a process
separate from the main browser engine to talk to the GPU. That process
has very limited access to the operating system via some fancy library
adventures. Presumably, the hope is that security bugs in the GL
driver would be harder to leverage into a remote system exploit.

Debugging in this environment is a bit tricky as you can't simply run
chromium under gdb and expect to be able to set breakpoints in the GL
driver. Instead, you have to run chromium with a magic flag which
causes the GPU process to pause before loading the driver so you can
connect to it with gdb and debug from there, along with a flag that lets you
see crashes within the gpu process and the usual flag that
causes chromium to ignore the GPU black list which seems to always
include the Intel driver for one reason or another:

Once Chromium starts up, it will print out a message telling you to
attach gdb to the GPU process and send that process a SIGUSR1 to
continue it. Now you can happily debug and get a stack trace when the
crash occurs.

Locating the Bug

The bug manifested with a segfault at the first access to a
DRI3-allocated buffer within the application. We've seen this problem
in the past; whenever buffer allocation fails for some reason, the
driver ignores the problem and attempts to de-reference through the
(NULL) buffer pointer, causing a segfault. In this case, Chromium
called glClear, which tried (and failed) to allocate a back buffer
causing the i965 driver to subsequently segfault.

We should probably go fix the i965 driver to not segfault when
buffer allocation fails, but that wouldn't provide a lot of additional
information. What I have done is add some error messages in the DRI3
buffer allocation path which at least tell you why the buffer
allocation failed. That patch has been merged to Mesa master, and
should also get merged to the Mesa stable branch for the next stable
release.

Once I had added the error messages, it was pretty easy to see what
happened:

The first two errors were just the sandbox preventing Mesa from using
my GL configuration file. I'm not sure how that's a security problem,
but it shouldn't harm the driver much.

The last error is where the problem lies. In Mesa, the DRI3
implementation uses a chunk of shared memory to hold a fence object
that lets Mesa know when buffers are idle without using the X
connection. That shared memory segment is allocated by creating a
temporary file using the O_TMPFILE flag:

fd = open("/dev/shm", O_TMPFILE|O_RDWR|O_CLOEXEC|O_EXCL, 0666);

This call “cannot fail” as /dev/shm is used by glibc for shared memory
objects, and must therefore be world writable on any glibc
system. However, with the Chromium sandbox enabled, it returns
EPERM.

Running Without a Sandbox

Now that the bug appears to be in the sandboxing code, we can re-test
with the GPU sandbox disabled:

$ chromium --ignore-gpu-blacklist --disable-gpu-sandbox

And, indeed, without the sandbox getting in the way of allocating a
shared memory segment, Chromium appears happy to use the Intel driver
with DRI3.

Final Thoughts

I looked briefly at the Chromium sandbox code. It looks like it needs
to know intimate details of the OpenGL implementation for every
possible driver it runs on; it seems to contain a fixed list of all
possible files and modes that the driver will pass to open(2). That
seems incredibly fragile to me, especially when used in a general
Linux desktop environment. Minor changes in how the GL driver operates
can easily cause the browser to stop working.