We are hiring

Optimising compilers as adversaries

Suppose that you want to handle some secret data in C and, in the wake of some high-profile vulnerability or other, want to take precautions against your secret being leaked. Perhaps you’d write something along these lines:

I think you could be forgiven for assuming that this does what it says. However, if you have what John Regehr calls ‘a proper sense of paranoia’, you might actually check. Here’s an excerpt of what I got when I ran clang -S -O2 -emit-llvm on this example:

If you want to know where it went, you can pass -mllvm -print-after-all to clang, and it will print the LLVM IR after each optimisation pass. If your system is anything like mine, you’ll see that clang starts off by calling wipe_secret normally, then at some point inlines the memset, before eventually removing it during a pass called “Dead Store Elimination”.

In other words, what has happened is that the combination of the following perfectly reasonable optimisations has produced something quite surprising:

If a function is small enough, then inline it rather than calling it

If a memory location gets written to but then never read from again, don’t bother actually performing the write

How can we stop this happening? Perhaps the most obvious solution is to turn down the optimisation level: if you try this example with -O1 you should see that the call to wipe_secret is preserved. There are several problems with this approach:

It’s not remotely portable

There’s no guarantee that the authors of clang won’t tweak the passes for -O1 in the future in such a way that wipe_secret will be removed again

Users can’t be expected to predict that setting their default CFLAGS to include -O2 will have security implications. The clang docs describe -O2 as ‘moderate’, for example, rather than ‘dangerous’.

In an ideal world, everyone would have a modern compiler and libraries, and we could just use memset_s from C11, whose entire purpose in life is to avoid this problem. I strongly suspect that in reality, someone would ‘port’ the code to older systems by adding -Dmemset_s=memset to their CFLAGS, thereby breaking the code again. (This is exactly what has happened with explicit_bzero, the OpenBSD equivalent of memset_s.)

I’ve been unable to find a satisfying solution to this problem in the wild. I was disturbed to discover that OpenSSH tries to do it by putting its equivalent of wipe_secret in a separate file, presumably in the hope that the compiler isn’t doing whole-program optimisation. I hope that I’m missing something, but I’m quite worried that the problem would come back if I happened to pass -flto to clang.

By far the most horrifying attempt I’ve seen, though, and the one that inspired this post, is from a certain infamous SSL library. Their implementation keeps a kind of running checksum of all memory ever wiped in a global variable, performing useless work in an attempt to trick the compiler into not optimising it away. What happens when compilers become smart enough to identify that it’s useless work? My guess is that no-one will notice at first, and the library will silently gain (another) vulnerability. When the vulnerability is exploited, perhaps the arms race against the optimiser will escalate.

Perhaps the simplest solution to this problem is just to minimise the use of C. Higher-level languages have their own side-channel attacks, for sure, but for example, RabbitMQ was not vulnerable to Heartbleed because it uses SSL handshake logic written in Erlang: buffer over-reads are simply not possible in Erlang.

After more than 40 years, why do we still use C at all? I’ll be considering that in my next post.

RabbitMQ was not vulnerable to Heartbleed because it wasn’t using OpenSSL, it was a “feature” that was implemented to improve performance, effectively implementing their own memory management because malloc was considered too slow.

To suggest it wouldn’t have happened in Erlang because “buffer over-reads are simply not possible” is to completely miss why the bug occurred. To over-simplify : if I create my own block of memory in the form of an array of size n, I then fill the first half with data from one user and then the second half with data from a second user, nothing prevents a read of the whole array, so if the first user can control the number of elements returned by a read from the array the user can ask for all n elements, no buffer over-reads required.

As I say, my example is way over-simplified, but it has a kernel of truth. The moral of the story is that all the safe and secure mechanisms, such as memory protection, will not help if a programmer explicitly sets out to bypass them :