Fact 1: speculation

Putting a lot of chips in parallel costs a lot, so the old Von Neumann model (1 chip, 1 bus, 1 RAM etc) was still the winning move on 1990.

To push speed, we can add more ALU (artihmetic unit) on the chip. So we can try to execute some operation in parallel, even if they are presented serially, if and only if the final results is the same.

Reordering sequential instructions is a powerful way to recover more instruction-level parallelism, but as processors become wider (able to triple- or quadruple-issue instructions) it becomes harder to keep all those pipes busy. Modern processors have therefore grown the ability to speculate. Speculative execution lets us issue instructions which might turn out not to be required (because they may be branched over): this keeps a pipe busy (use it or lose it!), and if it turns out that the instruction isn’t executed, we can just throw the result away.

The Intel chips execute two branches of an if, and then throw away the result it need not. But if you try to access to a illegal location (like a kernel protected are), the chip does it and emit an expetion only if the branch is effectly excuted (FACT1).

But if the illegal access is execute and the the result is thrown away, the chip “caches” the memory location in its fast internal RAM caches.

It is called “implicit caching”

Fact 2: caching

Implicit caching occurs when a memory element is made potentially cacheable, although the element may never have been accessed in the normal von Neumann sequence. Implicit caching occurs on the P6 and more recent processor families due to aggressive prefetching, branch prediction, and TLB miss handling. Implicit caching is an extension of the behavior of existing Intel386, Intel486, and Pentium processor systems, since software running on these processor families also has not been able to deterministically predict the behavior of instruction prefetch.

Now because of caching, you can trick the chip to read two distant uncached memory area based on a bit stored on a protected kernel area…and bummm you have just discovered a way to dump your internal address space.

Because nowedays a lot of server run in cloud environment, this exposes cloud provider to ability to read sensitive data of other customers, as far as we can understand.

The fix was rolled out after six motnths of hard work, and today the “solution” is a slow software workaround at operating system level.

Also spectre attack cannot be patched, but is difficult to use (so keep calm and continue reading).

Is this a bug?

Difficult to say. For sure speculation and instruction reordering is a very complex algorithm, and some huble ARM chips did not have it.
But some advanced ARM, AMD and Intel chips does it: it is a “common” technology nowadays. Like fast carry on addition algorithm.
Raspberry Pi is totally untouched by this vulenrability, this is the only good news.
But a lot of chips can be attacked in this way.

For example when the program’s control flow depends on an uncached valuelocated in the physical memory, it may take several
hundred clock cycles before the value becomes known.
Rather than wasting these cycles by idling, the processorguesses the direction of control flow, saves a checkpointof its register state, and proceeds to speculatively executethe program on the guessed path. When the value eventually
arrives from memory the processor checks the cor1
rectness of its initial guess. If the guess was wrong, theprocessor discards the (incorrect) speculative execution
by reverting the register state back to the stored checkpoint,
resulting in performance comparable to idling. Incase the guess was correct, however, the speculative executionresults are committed, yielding a significant performancegain as useful work was accomplished duringthe delay.