Massive Intel CPU Bug Leaves Kernel Vulnerable, Slows Performance

Intel’s CPU security took some whacks a few months ago, with well-publicized problems with the Intel Management Engine. If rumors are to believed, 2018 could kick off on an even worse year for the company. There’s growing speculation that there’s a major bug in Intel CPUs that requires a wholesale change in how Linux, Windows, and macOS map page tables, with the apparent goal of preventing Intel x86 CPUs from disclosing the layout of the kernel address space to an attacker. A similar patch is in the works for ARM systems as well; AMD CPUs are (as of this writing) not affected by this issue.

Here’s what we know so far: An initial article at LWN.Net lays out a new set of patches for the Linux kernel that began in late October and have continued through the present day. These efforts focus on implementing kernel page-table isolation, or KPTI, which splits page tables (currently shared between kernel space and user space) into two sets of data, one for each side. Microsoft is apparently prepping its own fix and is expected to launch it in the not-too-distant future.

We don’t know how attackers exploit the hardware bug in Intel and apparently ARM CPUs yet. All we know is that it’s apparently possible to discern the contents of protected kernel memory by leveraging this exploit. There may be some conceptual similarities to Rowhammer, the DDR memory attack technique that we’ve discussed before, in how this attack is carried out. Rowhammer can be used to change the data stored in certain memory locations by “hammering” adjacent rows of DRAM until the electrical charge in the target cells flips.

The blog Python Sweetness has published a fairly good discussion of what we know and don’t know about this security issue, though the author of the post also links to an erroneous report suggesting that AMD CPUs take a 50 percent performance hit when the software solution for the fix is enabled (AMD CPUs, as of this writing, are not expected to need patching). The solution to the problem is to enable a capability known as page table isolation (PTI), but this apparently causes significant performance degradation in some Intel CPUs running some workloads. Postgre SQL tests suggest slowdowns of 7 percent to 23 percent, depending on which Intel CPU you test.

Recent Intel CPUs may not be affected by this issue to the same extent as older chips, but I haven’t been able to confirm that personally. There are references to using the “nopcid” instruction to disable other features Intel built into its Core microarchitecture to mitigate the performance hit from separating the kernel and user memory space, but no clear demarcation on when those mitigating features were themselves introduced. The nopcid instruction was added with AVX2 support when Haswell was new, which would seem to imply that Intel CPUs pre-Haswell might face larger penalties than chips post-Hawell.

Right now, the list of what we don’t know is longer than what we do. There are implications for cloud vendors and developers across the entire spectrum where ARM and x86 are deployed, but until we know more about the security flaw and at-risk systems, we’d counsel against any quick conclusions. Hat-tip to Hot Hardware, where we first saw the story.