Kernel panic! What are Meltdown and Spectre, the bugs affecting nearly every computer and device?

If you’re confused by the avalanche of early reports, denials, and conflicting statements about the massive security issues announced today, don’t worry — you’re far from the only one. Here’s what you need to know about Meltdown and Spectre, the two huge bugs that affect practically every computer and device out there.

What are these flaws?

Short answer: Bugs at a fundamental level that allow critical information stored deep inside computer systems to be exposed.

It’s not a physical problem with the CPUs themselves, or a plain software bug you might find in an application like Word or Chrome. It’s in between, at the level of the processors’ “architectures,” the way all the millions of transistors and logic units work together to carry out instructions.

In modern architectures, there are inviolable spaces where data passes through in raw, unencrypted form, such as inside the kernel, the most central software unit in the architecture, or in system memory carefully set aside from other applications. This data has powerful protections to prevent it from being interfered with or even observed by other processes and applications.

Meltdown and Spectre are two techniques researchers have discovered that circumvent those protections, exposing nearly any data the computer processes, such as passwords, proprietary information, or encrypted communications.

Meltdown primarily affects Intel processors, and works by breaking through the barrier that prevents applications from accessing arbitrary locations in kernel memory. Segregating and protecting memory spaces prevents applications from accidentally interfering with one another’s data, or malicious software from being able to see and modify it at will. Meltdown makes this fundamental process fundamentally unreliable.

Spectre affects Intel, AMD, and ARM processors, broadening its reach to include mobile phones, embedded devices, and pretty much anything with a chip in it. Which, of course, is everything from thermostats to baby monitors now.

It works differently from Meltdown; Spectre essentially tricks applications into accidentally disclosing information that would normally be inaccessible, safe inside their protected memory area. This is a trickier one to pull off, but because it’s based on an established practice in multiple chip architectures, it’s going to be even trickier to fix.

Who is affected?

Short answer: Pretty much everybody.

Chips going back to 2011 were tested and found vulnerable, and theoretically it could affect processors as far back as those released in 1995. One would hope there aren’t too many of those in use, but we may be unpleasantly surprised on that count.

Because Meltdown and Spectre are flaws at the architecture level, it doesn’t matter whether a computer or device is running Windows, OS X, Android, or something else — all software platforms are equally vulnerable.

A huge variety of devices, from laptops to smartphones to servers, are therefore theoretically affected. The assumption going forward should be that any untested device should be considered vulnerable.

Not only that, but Meltdown in particular could conceivably be applied to and across cloud platforms, where huge numbers of networked computers routinely share and transfer data among thousands or millions of users and instances.

The good news is that the attack is easiest to perform by code being run by the machine itself — it’s not easy to pull this off remotely. So there’s that, at least.

Can this be fixed?

Short answer: Only partially, and it’s going to take a while.

Many, many devices are “affected” or “vulnerable” to these flaws, but that’s not the same thing as saying they’re totally open to attack. Intel, AMD, ARM and others have had months to create workarounds and “mitigations,” which is a polite way of saying “band-aids.”

Meltdown can be fixed essentially by building a stronger wall around the kernel; the technical term is “kernel page table isolation.” This solves the issue, but there’s a cost. Modern CPU architectures assume certain things about the way the kernel works and is accessed, and changing those things means that they won’t be able to operate at full capacity.

The Meltdown fix may reduce the performance of Intel chips by as little as 5 percent or as much as 30 — but there will be some hit. Whatever it is, it’s better than the alternative.

Spectre, on the other hand, is not likely to be fully fixed any time soon. The fact is that the practice that leads to this attack being possible is so hard-wired into processors that the researchers couldn’t find any way to totally avoid it. They list a few suggestions, but conclude:

While the stop-gap countermeasures described in the previous section may help limit practical exploits in the short term, there is currently no way to know whether a particular code construction is, or is not, safe across today’s processors – much less future designs.

What will actually happen is hard to say, but there will likely be a flurry of updates that carry out various software hacks to protect against the most obvious and damaging attacks. Microsoft has already issued one for Windows; ARM has a set of mitigations for its affected chips; Amazon is updating its many servers.

How broadly and quickly will these mitigation patches be applied, though? How many devices are out there, vulnerable, right now? These updates may not be pretty, perhaps requiring changes that will break other software, drivers, and components. And all will likely involve degrading performance.

A more permanent fix will require significant changes across the board — the circuit board, that is. Basic architecture choices that have been baked into our devices for years, even decades, will have to be rethought. It won’t be easy, and it won’t be fun.

In the meantime companies are working at full capacity to minimize the apparent threat: “mitigations” that may or may not prevent some or all of the variant attacks. As usual, these patches will likely reach only a small subset of new, fast-updating users and devices, or those the company can update directly on its own. We will only know the efficacy of these measures by their performance in the real world.

It’s worth noting that there won’t be a “recall.” If this flaw affected a single device, like the battery problems in Samsung’s phones a while back, a recall would make sense. But this is an issue that affects millions, perhaps billions of devices. A recall isn’t an option.

Why are we only just hearing about this?

Short answer: A planned joint disclosure was preempted by reporters.

It’s always a bit odd to hear that companies were informed of a major security flaw like this one months ago, as was the case with Meltdown and Spectre. This particular exploit has been under investigation for some time by researchers, and word of it trickled out in the form of small updates to various operating systems addressing a hitherto-undocumented security flaw.

If the researchers just tweeted out the details when they discovered them, it would essentially be giving attackers access to that information at the same time as the companies that can fix the problem. Generally security investigators do what’s called responsible disclosure, contacting affected companies secretly, either as a simple courtesy or in order to collaborate on a solution.

In this case Google contacted Intel several months ago, and no doubt others knew to some degree as well, since Microsoft issued patches to insiders well ahead of the public announcement, and Linux distributions were likewise addressing the issue even though the papers describing the flaw were not out yet.

The plan would normally be that the affected company or companies would come up with a solution, quietly apply it, then announce both the flaw and the solution at the same time. And in fact that seems to be what was planned in this case.

But smart reporting by The Register, which among others put together the disparate pieces, seems to have forced the hands of several billion-dollar companies. The companies scrambled to finalize their statements, addressing “inaccurate” media reports and hastily issuing patches and explanations that likely weren’t due until next week.

While some may suggest that El Reg should have let things take their course, there’s a great deal to be said for not allowing the billion-dollar companies in question to completely control the narrative around a major issue like this. If the only version of the story we ever heard was one approved by their joint committee, things would likely have been painted in a different light.

As the researchers put it at the end of the the Spectre paper:

The vulnerabilities in this paper, as well as many others, arise from a longstanding focus in the technology industry on maximizing performance. As a result, processors, compilers, device drivers, operating systems, and numerous other critical components have evolved compounding layers of complex optimizations that introduce security risks. As the costs of insecurity rise, these design choices need to be revisited, and in many cases alternate implementations optimized for security will be required.