Posts Tagged ‘branch prediction’

Introduction

I predicted that 2018 would be a very bad year for data breaches and security problems, and we have already started the year with the Intel x86 specific Meltdown exploit and the Spectre exploit that works on all sorts of processors and even on some JavaScript systems (like Chrome). Since my last article was on Assembler programming and most of these type exploits are created in Assembler, I thought it might be fun to look at how Spectre works and get a feel for how hackers can retrieve useful data out of what seems like nowhere. Spectre is actually a large new family of exploits so patching them all is going to take quite a bit of time, and like the older buffer overrun exploits, are going to keep reappearing.

I’ve been writing quite a bit about the Raspberry Pi recently, so is the Raspberry Pi affected by Spectre? After all it affects all Android and Apple devices based on ARM processors. The main Raspberry Pi operating system is Raspbian which is variant of Debian Linux optimized for the Pi. A recent criticism of Raspbian is that it is still 32-Bit. It turns out that running the ARM in 32-bit mode eliminates a lot of the Spectre attack scenarios. We’ll discuss why this is in the article. If you are running 64-Bit software on the Pi (like running Android) then you are susceptible. You are also susceptible to the software versions of this attack like those in JavaScript interpreters that support branch prediction (like Chromium).

The Spectre hacks work by exploiting how processor branch prediction works coupled with how data is cached. The exploits use branch prediction to access data it shouldn’t and then use the processor cache to retrieve the data for use. The original article by the security researchers is really quite good and worth a read. Its available here. It has an appendix at the back with C code for Intel processors that is quite interesting.

Branch Prediction

In our last blog post we mentioned that all the data processing assembler instructions were conditionally executed. This was because if you perform a branch instruction then the instruction pipeline needs to be cleared and restarted. This will really stall the processor. The ARM 32-bit solution was good as long as compilers are good at generating code that efficiently utilize these. Remember that most code for ARM processors is compiled using GCC and GCC is a general purpose compiler that works on all sorts of processors and its optimizations tend to be general purpose rather than processor specific. When ARM evaluated adding 64-Bit instructions, they wanted to keep the instructions 32-Bit in length, but they wanted to add a bunch of instructions as well (like integer divide), so they made the decision to eliminate the bits used for conditionally executing instructions and have a bigger opcode instead (and hence lots more instructions). I think they also considered that their conditional instructions weren’t being used as much as they should be and weren’t earning their keep. Plus they now had more transistors to play with so they could do a couple of other things instead. One is that they lengthed the instruction pipeline to be much longer than the current three instructions and the other was to implement branch prediction. Here the processor had a table of 128 branches and the route they took last time through. The processor would then execute the most commonly chosen branch assuming that once the conditional was figured out, it would very rarely need to throw away the work and start over. Generally this larger pipeline with branch prediction lead to much better performance results. So what could go wrong?

Consider the branch statement:

if (x < array1_size) y = array2[array1[x] * 256];

This looks like a good bit of C code to test if an array is in range before accessing it. If it didn’t do this check then we could get a buffer overrun vulnerability by making x larger than the array size and accessing memory beyond the array. Hackers are very good at exploiting buffer overruns. But sadly (for hackers) programmers are getting better at putting these sort of checks in (or having automated or higher level languages do it for them).

Now consider branch prediction. Suppose we execute this code hundreds of times with legitimate values of x. The processor will see the conditional is usually true and the second line is usually executed. So now branch prediction will see this and when this code is executed it will just start execution of the second line right away and work out the first line in a second execution unit at the same time. But what if we enter a large value of x? Now branch prediction will execute the second line and y will get a piece of memory it shouldn’t. But so what, eventually the conditional in the first line will be evaluated and that value of y will be discarded. Some processors will even zero it out (after all they do security review these things). So how does that help the hacker? The trick turns out to be exploiting processor caching.

Processor Caching

No matter how fast memory companies claim their super fast DDR4 memory is, it really isn’t, at least compared to CPU registers. To get a bit of extra speed out of memory access all CPUs implement some sort of memory cache where recently used parts of main memory are cached in the CPU for faster access. Often CPUs have multiple levels of cache, a super fast one, a fast one and a not quite as fast one. The trick then to getting at the incorrectly calculated value of y above is to somehow figure out how to access it from the cache. No CPU has a read from cache assembler instruction, this would cause havoc and definitely be a security problem. This is really the CPU vulnerability, that the incorrectly calculated buffer overrun y is in the cache. Hackers figured out, not how to read this value but to infer it by timing memory accesses. They could clear the cache (this is generally supported and even if it isn’t you could read lots of zeros). Then time how long it takes to read various bytes. Basically a byte in cache will read much faster than a byte from main memory and this then reveals what the value of y was. Very tricky.

Recap

So to recap, the Spectre exploit works by:

Clear the cache

Execute the target branch code repeatedly with correct values

Execute the target with an incorrect value

Loop through possible values timing the read access to find the one in cache

This can then be put in a loop to read large portions of a programs private memory.

Summary

The Spectre attack is a very serious new technique for hackers to hack into our data. This will be like buffer overruns and there won’t be one quick fix, people are going to be patching systems for a long time on this one. As more hackers understand this attack, there will be all sorts of creative offshoots that will deal further havoc.

Some of the remedies like turning off branch prediction or memory caching will cause huge performance problems. Generally the real fixes need to be in the CPUs. Beyond this, systems like JavaScript interpreters, or even systems like the .Net runtime or Java VMs could have this vulnerability in their optimization systems. These can be fixed in software, but now you require a huge number of systems to be patched and we know from experience that this will take a very long time with all sorts of bad things happening along the way.

The good news for Raspberry Pi Raspbian users, is that the ARM in the older 32-Bit mode isn’t susceptible. It is only susceptible through software uses like JavaScript. Also as hackers develop these techniques going forwards perhaps they can find a combination that works for the Raspberry, so you can never be complacent.