In the near future – in all likelihood, later this month – at least Windows and Linux will get security updates that change the way those operating systems manage memory on Intel processors.

There’s a lot of interest, excitement even, about these changes: they work at a very low level and are likely to affect performance.

The slowdown will depend on many factors, but one report suggests that database servers running on affected hardware might suffer a performance hit around 20%.

“Affected hardware” seems to include most Intel CPUs released in recent years; AMD processors have different internals and are affected, but not quite as broadly.

So, what’s going on here?

On Linux, the forthcoming patches are known colloquially as KPTI, short for Kernel Page Table Isolation, though they have jokingly been referred to along the way as both KAISER and F**CKWIT.

The latter is short for Forcefully Unmap Complete Kernel With Interrupt Trampolines; the former for Kernel Address Isolation to have Side-channels Efficiently Removed.

Here’s an explanation.

Inside most modern operating systems, you’ll find a privileged core, known as the kernel, that manages everything else: it starts and stops user programs; it enforces security settings; it manages memory so that one program can’t clobber another; it controls access to the underlying hardware such as USB drives and network cards; it rules and regulates the roost.

Everything else – what we glibly called “user programs” above – runs in what’s called userland, where programs can interact with each other, but only by agreement.

If one program could casually read (or, worse still, modify) any other program’s data, or interfere with its operation, that would be a serious security problem; it would be even worse if a userland program could get access to the kernel’s data, because that would interfere with the security and integrity of the entire computer.

One job of the kernel, therefore, is to keep userland and the kernel carefully apart, so that userland programs can’t take over from the kernel itself and subvert security, for example by launching malware, stealing data, snooping on network traffic and messing with the hardware.

The CPU itself provides hardware support for this sort of separation: the x86 and x64 processors provide what are known as privilege levels, implemented and enforced by the chip itself, that can be used to segregate the kernel from the user programs it launches.

Intel calls these privilege levels rings, of which there are four; most operating systems use two of them: Ring 0 (most privileged) for the kernel, and Ring 3 (least privileged) for userland.

Loosely speaking, processes in Ring 0 can take control over processes and resources in higher-numbered rings, but not the other way around.

In theory, then, the processor itself blocks Ring 3 programs from reading Ring 0 memory, thus proactively preventing userland programs from peeking into the kernel’s address space, which could leak critical details about the system itself, about other programs, or about other people’s data.

In technical terms, a sequence of machine code instructions like this, running in userland, should be blocked at step 1:

mov rax, [kernelmemory] ; this will get blocked - the memory is protected
mov rbx, [usermemory] ; this is allowed - the memory is "yours"

Likewise, swapping the instructions, this sequence would be blocked at step 2:

mov rbx, [usermemory] ; this is allowed - the memory is "yours"
mov rax, [kernelmemory] ; this will get blocked - the memory is protected

Now, modern Intel and AMD CPUs support what is called speculative execution, whereby the processor figures out what the next few instructions are supposed to do, breaks them into smaller sub-instructions, and processes them in a possibly different order to how they appear in the program.

This is done to increase throughput, so a slow operation that doesn’t affect any intermediate results can be started earlier in the pipeline, with other work being done in what would otherwise be “dead time” waiting for the slow instruction to finish if it ran at the end of the list.

Above, for example, the two instructions are computationally independent, so it doesn’t really matter what order they run in, even though swapping them round changes the moment at which the processor intervenes to block the offending instruction (the one that tries to load memory from the kernel).

Order does matter!

Back in July 2017, a German security researcher did some digging to see if order does, in fact, matter.

He wondered what would happen if the processor calculated some internal results as part of an illegal instruction X, used those internal results in handling legal instruction Y, and only then flagged X as disallowed.

Even if both X and Y were cancelled as a result, would there be a trace of the internal result left over from the speculative execution of the illegal instruction X?

If so, could you figure out something from that left-over trace?

The example that the researcher started with looked like this:

1. mov rax, [K] ; K is a kernel address that is banned
2. and rax, 1
3. mov rbx, [U+rax] ; U is a user address that is allowed

Don’t worry if you don’t speak assembler – what this code does is:

Load the A register from kernel memory.

Change A to 0 if it was even or 1 if it was odd (this keeps the thought experiment simple).

Load register B from memory location U+0 or U+1, depending on A.

In theory, speculative execution means that the CPU could finish working internally on instruction 3 before finishing instruction 1, even though the whole sequence of instructions would ultimately be invalidated and blocked because of the privilege violation in 1.

Perhaps, however, the side-effects of instruction 3 could be figured out from elsewhere in the CPU?

After all, the processor’s behaviour would have been slightly different depending on whether the speculatively-executed instruction 3 referenced memory location U or U+1.

For example, this difference might, just might, show up in the CPU’s memory cache – a list of recently-referenced memory addresses plus their values that is maintained inside the CPU itself for performance reasons.

In other words, the cache might act as a “telltale”, known as a side channel, that could leak secret information from inside the CPU – in this case, whether the privileged value of memory location K was odd or even.

(Looking up memory in CPU cache is some 40 times faster than fetching it from the actual memory chips, so enabling this sort of “short-circuit” for commonly-used values can make a huge difference to performance.)

The long and the short of it is that the researcher couldn’t measure the difference between A is even and A is odd (or, alternatively, did the CPU peek at U or did the CPU peek at U+1) in this case…

…but the thought experiment worked out in the end.

The researcher found other similar code constructions that allow you to leech information about kernel memory using address calculation tricks of this sort.

In other words, Intel CPUs suffer from a hardware-level side channel that could leak privileged memory to unprivileged programs.

The rest is history

And the rest is history.

Patches are coming soon, at least for Linux and Windows, to deliver KAISER: Kernel Address Isolation to have Side-channels Efficiently Removed, or KPTI, to give its politically correct name.

Now you have an idea where the name KAISER came from: the patch keeps kernel and userland memory more carefully apart so that side-effects from speculative execution tricks can no longer be measured.

This security fix is especially relevant for multi-user computers, such as servers running several virtual machines, where individual users or guest operating systems could use this trick to “reach out” to other parts of the system, such as the host operating system, or other guests on the same physical server.

However, because CPU caching is there to boost performance, anything that reduces the effectiveness of caching is likely to reduce performance, and that is the way of the world.

Sometimes, the price of security progress is a modicum of inconvenience, in much the same the way that 2FA is more hassle than a plain login, and HTTPS is computationally more expensive than vanilla HTTP.

In eight words, get ready to take one for the team.

What next?

A lot of the detail beind these patches is currently [2018-01-03T16:30Z] hidden behind a veil of secrecy .

This secrecy seems to be down to non-disclosure clauses imposed by various vendors involved in preparing the fixes, an understandable precaution given the level of general interest in new ways to pull off data leakage and privilege escalation exploits.

We expect this secrecy to be lifted as patches are officially published.

However, you can get and try the Linux patches for yourself right now, if you wish. (They aren’t finalised yet, so we can’t recommend using them except for testing.)

So far as we know at the moment, the risk of this flaw seems comparatively modest on dedicated servers such as appliances, and on personal devices such as laptops: to exploit it would require an attacker to run code on your computer in the first place, so you’d already be compromised.

On shared computers such as as multiuser build servers or hosting services that run several different customers’ virtual machines on the same physical hardware, the risks are much greater: the host kernel is there to keep different users apart, not merely to keep different programs run by one user apart.

So, a flaw such as this might help an untrustworthy user to snoop on other who are logged in at the same time, or to influence other virtual machines hosted on the same server.

This flaw has existed for years and has been documented about for months at least, so there is no need to panic; nevertheless, we recommend that you keep your eyes out for patches for the operating systems you use, probably in the course of January 2018, and that you apply them as soon as you can.

UPDATES

[2018-01-04T01:00Z]

Google’s Project Zero bug hunting team has now published a detailed description of the behind-the-scenes research that’s been going on for the past few months. It’s both technical and jargon-heavy, but the main takeways are:

In theory, various Intel, AMD and ARM processors have features related to speculative execution and caching that can be exploited as described above.

AMD chips have so far only been exploited when using Linux with a non-default kernel feature enabled.

Intel chips have been exploited so that an unprivileged, logged-in user can read out kernel data slowly but steadily.

Intel chips have been exploited so that a root user in a guest virtual machine can read out host kernel data slowly but steadily.

(“Slowly” means that an attacker could suck out on the order of 1000 bytes per second, or approximately 100MBytes per day.)

Even if you assume that an attacker didn’t know where to focus his attempts, but could do no better than to grab live kernel data at random, you can consider this issue to be a bit like Heartbleed, where an attacker would often end up with garbage but might occasionally get lucky and grab hold of secret data such as passwords and private decryption keys.

Unlike Heartbleed, the attacker already needs a footprint on a vulnerable server, for example as a logged-in user with a command shell open, or as the owner of a virtual machine (VM) running on a hosting server. (In both cases the user ought to be constrained entirely to his own account or to his own VM.)

Intel has published a brief official comment entitled Intel responds to security research findings. There isn’t much in this statement, so don’t get too excited; the salient points are:

“Intel believes these exploits do not have the potential to corrupt, modify or delete data.” Indeed, the attacks and exploits reported so far can suck data out of the kernel, but not put any data back into kernel space.

“Recent reports that these exploits are caused by a ‘bug’ or a ‘flaw’ and are unique to Intel products are incorrect.” We used the word ‘flaw’ in our headline, and we’ll stick with it. In our opinion, an ideal implementation of speculative execution would ensure that there were no detectable side-effects left behind after a speculative execution was found to have violated security.

“Contrary to some reports, any performance impacts are workload-dependent, and, for the average computer user, should not be significant.” You will have to interpret that for yourself.

[2018-01-04T17:40Z]

AMD has issued a statement headlined An update on AMD processor security. Like the Intel statement, it doesn’t say an awful lot, but it does confirm that AMD CPUs are not entirely immune to these attacks.

There are three CVE vulnerability numbers attached to the various F**CKWIT exploits: CVE-2017-5753, CVE-2017-5715 and CVE-2017-5754.

AMD claims that it is vulnerable to -5733, immune to -5754, and that although it is in theory at risk from -5715, “differences in AMD architecture mean there is a near zero risk of exploitation of this variant.”

[2018-01-05T00:30Z]

Firefox just pushed out a browser update to mitigate these attacks. Firefox now moves to version 57.0.4.

This update makes it much harder for JavaScript running in the browser to measure short time intervals accurately – timing memory access speeds is necessary in these attacks so you can figure out which memory addresses ended up cached and which ones didn’t.

A memory address that is currently cached must have been accessed recently, a trick that helps you figure out what happened when an instruction was speculatively executed, even if it got cancelled in the end.

You can get closer than that quickly enough: Google just released a detailed explanation of all about this stuff and how it has unfolded in the past few months. I’m about to update the article with relevant links, for completeness [208-01-04T00:10Z].

As far as I think I know so far, there’s no PoC that shows that ARM can be “data leeched” in this way *between two processes* (i.e. across a protection boundary). It’s just a theoretical possibility on CPUs that have: memory protection, out-of-order-execution, and an on-chip cache that leaks information about instructions that were attempted internally but subsequently not actually executed. And it’s a practical proposition on Intel (and to a much lesser extent, AMD) chips.

The place to track where we are officially is our knowledgebase article (KBA) here:

https://community.sophos.com/kb/en-us/128053

We haven’t been able to provoke any blue screens with our product in testing (but remember that the official patch only appeared very recently!), and we can’t see why we would, given that Microsoft attributes any blue screens to “unsupported calls into Windows kernel memory”. AFAIK, we carefully avoid undocumented kernel hacks, for precisely the reason that they can be problematic after updates.

Sadly, the Microsoft document attributes the blue screens to “a small number of products” but without identifying them, so we’re all tarred with the same brush. In short, my own opinion is that our product is just fine, but I can’t prove it; Microsoft is implying it might not be, but without giving any indication of whether it had problems with our product – or, indeed, even tried it.

All I can offer you at 2018-01-04T13:00Z is this sentence from our KBA: “[we are] currently testing this patch and registry key, with initial results showing no compatibility issues.”

In short, *we* can’t get our product to cause problems after the patch, and we don’t think you will have problems, but there’s no easy way to offer you *proof*.

…as of 2018-01-04T13:00Z, “[we plan] to automatically add the registry key early next week, once all tests have been completed.” (See above comment about blue screens of death.)

In the meantime, setting the registry entry yourself will do the trick.

BTW, if this sounds slow (you’re only going to set the magic registry entry next WEEK?!?), please bear in mind: that the final patch and registry entry was announced very recently; that Microsoft has thrown the cat amongst the pigeons by alluding to blue screens of death due to “a small number of products” without naming them; and that the incumbent anti-virus is expected to set this registry key *on behalf of every other piece of software on the computer for all Windows versions out there*.

PS. Donations of sympathy, pizza and energy drinks for our Windows coders and QA team gratefully accepted. (Not really – postal regulations make this tricky. But expressions of sympathy and “virtual pizzas” would at least give them something to smile about :-)

I would advise to wait for a bit before making any such decisions. At the moment AMD CPUs seem to be susceptible to one variant (out of three) of the speculative execution flaw – see this statement by AMD: http://www.amd.com/en/corporate/speculative-execution. I guess we need to wait and see as at the moment there is still much more questions than answers.

What I don’t quite understand is the “leave it to others” approach by AMD who admitted to be waiting for the issue to be resolved with software/OS updates. Hopefully they are actually working hard behind the scenes to fully asses the problem and to come up with a plan how to improve CPU design in the future. I think that the hardware market (especially CPU) has been focused mainly on performance for too long and as a result security needs to be retrofitted in such scenarios like this one here. Security should be THE top priority, with others like performance and compatibility playing a secondary role.

There is another interesting article by Business Insider with information that Amazon and Google have now patched most of their cloud systems and are reporting “negligible impact on performance.” What is worth noting is that this not just some synthetic numbers but rather based on real life data from some of the largest environments running behemoths like YouTube or Netflix. So it is quite relieving that they are not seeing previously expected performance hits.

What I am concerned the most is all those smarty thingies, metres, sensors, toys, cameras, and other IOT devices, with no chance of seeing any updates. Also branded devices that are old enough to be outside of support but not old enough to have unaffected CPUs could be a significant risk factor purely due to the number of these devices still in use. This also includes ARM-based devices as according to Google this platform is also potentially at risk, although not many details have been released so far.

No reason to be worried for the most part – that IoT device is only vulnerable if it allows a user to supply a program to it. If you supply a program to your own camera then possibly you can get out data from it – but that doesn’t really hurt you. It’s only when someone else is able to supply a program to the IoT device that life gets interesting.

It sounds like the current resolution is to universally kill speculative processing.

I am curious how the CPU makers might allow this out-of-order processing to “come back”. Maybe they will “hide” cache results until they are both allowed by the process and called-upon, which would almost entirely eliminate the benefit of speculative processing, while adding tracking-and-security (data-locking) overhead. However, the speed of on-chip cache might be enough of a boost to still make this hidden-and-protected-speculative-processing routine worthwhile.

Even after the CPUs are altered in some way, to allow this out-of-order processing to “come back”, it may be a long time before kernels are programmed to take advantage of the new caching. It’s the classic chicken-and-the-egg problem: Nobody wants to re-implement a potentially insecure algorithm until someone popular tests it with their software. However, until someone takes that risk, nobody will.

Yes! I saw that – click on the KBA link at the top of the article. Great result given the amount of notice we were given by Microsoft.

(First time I can remember that we have had to push out an update for the purpose of updating someone else’s software. If the patch were to *our* software that would make sense…but this seems weird. Let’s hope it doesn’t become Microsoft’s regular MO!)

Duck, sorry to be pedantic here, but the behavior you are describing is “Out-of-Order Execution”, not Speculative Execution. (I see you did use the correct term in your January 4, 2018 at 12:24 pm response,

Out of order execution means executing instructions in a different order than the programmer wrote them, when it can be done without affecting the results.

Speculative execution means executing instructions after a conditional branch before the condition has been evaluated.

Examples:
Out of Order Execution
——————————-
Programmer codes:
mov rax, [25] ; this load of AX from storage location 25 will take some time
add rbx, rax ; so processor stalls until AX can be added to BX
mov rcx, 0 ; direct load of CX with 0

Processor executes:
mov rax, [25] ; this load of AX from storage location 25 will take some time
mov rcx, 0 ; so processor runs this unrelated instruction instead of stalling
add rbx, rax ; then processor does the addition. At this point, state is the same as
; above, but faster

Speculative Execution
——————————
label1: mov rax, 3 ; set RAX to 3
; some code here modifying BX or CX
sub rcx,rbx ; subtract BX from CX, results in CX (signed integers)
jl label1 ; if BX>CX, loop back to label1, else set AX to 5 and continue
; Processor stalls until the condition from the subtraction is stable
; before executing the jump (which could include a cache reload)
mov rax, 5 ; Set RAX to 5
; more code here

Speculative execution serves to keep the processor from stalling while the condition is being evaluated before the jump decision can be made. Speculative execution could set RAX to 5, and then back the result out if the jump is taken. Conversely, the processor could set RAX to 3 and back the result out if the jump is not taken, Modern processors keep track of what happened the previous time the condition was executed and do the same thing. (Some processors execute both paths in parallel and discard one, thus optimizing performance at the expense of more hardware,)

I originally used “out-of-order” for my example, but with so many articles out there just talking about “speculative execution” that I decided to use the term in a sort of general sense.

So: execute more than one instruction at the same time “just in case that saves time”. Cancel the instructions if a security problem happens along the way and remove all side-effects that might leak what happened inside the CPU…

…but some of those side-effects are not reverted and can be used to figure out data that wasn’t supposed to escape.

Indeed, the issue is not whether instructions get reordered but that multiple instructions *with a flow of data between them* can be processed in parallel, as a bunch of sub-instructions, until one of them is blocked for a security violation. At this point the CPU is supposed to cancel everything that has been computed internally in order to erase all traces that might have been left by the blocked instruction and the operandi associated with it.

I think the word “speculative” is perfect my suggestive there – the CPU has done the work “in case” – it might be useful, and if it is, there’s a performance win.

My two cents.
Well. I’m sure glad Intel patched their AMT Management Engine flaw. Because I bet one could use that management engine flaw to get this delicious side-channel attack code in the system and like the number of licks to the center of a tootsie pop, “the world may never know.”

“We are also evaluating our products such as XG Firewall, UTM and other appliances, that run on Linux and Intel hardware to ensure that they are appropriately protected against this vulnerability and will update this article with further information and advice once we have it.”

When the evaluation is complete the Knowledge Base article will be updated.
https://community.sophos.com/kb/en-us/128053

In theory, any program could sneakily steal data out of memory on your computer as you work, even data being that’s looked after by the operating system itself, which ought to be off-limits to everyone. So update as soon as your operating system offers you the chance.