Anatomy of a Security Flaw Announcement: The Strange Timeline of Spectre and Meltdown

Spectre and Meltdown are two major processor vulnerabilities that represent a serious security issue inherent in millions of devices. Here's a look at the unusual way these issues were revealed to the public and what we can expect going forward.

On January 3rd, news broke that Intel’s chips were vulnerable due to two major bugs that had been discovered, dubbed Meltdown and Spectre—vulnerabilities that have evidently been around since the 90s. Since the initial announcement, information has been coming out at rapid fire, sometimes contradictory in nature and other times downright confusing.

At best, what the average person might understand is that these flaws impact a large majority of computing devices ranging from smartphones to data centers. Logical questions tend to follow, directed at those in the tech industry: Are my devices affected? What's at stake? What should I do?

Of course, another question may also crop up: How long have you known about this?

Here's a look at Spectre, Meltdown, and the journey from discovering a security vulnerability to telling the public about it.

A Brief Look at Speculative Execution

To understand the timeline for addressing these vulnerabilities, we'll first need a basic understanding of what they entail.

There has been quite a bit of coverage on the topic already, but the gist is that Meltdown and Spectre are two recently-identified vulnerabilities that arise from the result of “speculative execution”. Speculative execution is a tactic where a system will anticipate some processes based on previous routines. This shortcut makes these processes extremely fast to complete, but it comes at a cost.

The security issue in question occurs when data resulting from speculative execution is re-routed into shared memory. This information is then vulnerable to side-channel attacks and could make otherwise privileged data accessible to malicious attackers.

Speculative execution has been used for roughly 20 years to speed up processing and has become deeply ingrained in processor architecture. It's so entrenched that it could take years more before processors don't need to rely on this method to process at lightning speeds.

So if it will take years to fully address the core vulnerabilities in the processor architecture, does it really matter how and when the public was made aware of them? Let's take a look at how vulnerabilities are typically approached and why.

A Timeline in Disarray

For the most part, when major vulnerabilities or security flaws are publicly announced, vendors are already prepared with updates and patches. While these immediate patches represent the work of hundreds or thousands of quick-thinking engineers, they're hardly due to prescience. When a vulnerability is discovered, the modus operandi has so far been to inform vendors discretely so that they can begin to work on solutions before the news goes public. This standard helps to curtail the chance that the vulnerability will be taken advantage of before patches are developed.

The problem is, this only works if all companies and vendors work together and give each other a chance to properly develop, test, and implement these patches. Leaks can happen as a result of media finding out early, a company accidentally letting the information get out, updates being released too early, or sometimes a combination of these factors.

In the case of Spectre and Meltdown, many vendors weren’t ready with patches. Even worse, one of the vulnerabilities is due to a flaw in the inherent design of Intel’s CPUs—definitely not an issue that can be fixed overnight.

“The tier 1 group [of tech companies] are Google, Amazon, and Intel, and they knew about [the problem] since around June of last year, I believe”, says Marty Puranik, CEO of Atlantic.net, who spoke to All About Circuits. “But most others found out about it probably just after Christmas." As could be expected, Intel was one of the first companies to announce patches on January 4th.

Puranik, who started his first tech company in his college dorm room in the 90s and has been in the industry for quite a while, explains that the company knew something would be coming up when employees started noticing kernel patches in progress of a specific nature. “That’s how we found out about it, but at the time it wasn’t known it would be two separate bugs. We thought it was going to be one”.

When the vulnerabilities were confirmed, it appeared that the industry would follow the typical script of preparing patches before announcing the problem. But then, somewhere, there was a breakdown.

Even the newest processors released last year are affected by these vulnerabilities. Image from Intel.

“What happened was that there was an embargo [for] when everyone was supposed to come out with patches on January 9th," Puranik explains, "but because this got leaked early, a lot of people who were working on it were still testing and weren’t done with their patches because they thought they had until January 9th to release it."

This lack of preparedness and the wide-scale media coverage most likely contributed to the conflicting information being released on the scope of the problem, what solutions are actually available, and how the updates are going to impact device performance.

For example, Intel originally stated that patching Meltdown and Spectre would only produce mild performance reduction, but in reality processor performance will be expected to suffer significantly, especially in older processors. In some of the worst cases, these performance issues could cause systems to reboot and become unstable.

Even the question of which chips are impacted was not clear initially. Currently, it is known that Meltdown largely impacts Intel chips, while Spectre impacts Intel, AMD, and ARM chips. As a result, Intel has been glaringly in the spotlight, particularly on the subject of transparency. Greg Kroah-Hartman, arguably one of the best-known faces of the Linux Foundation's leadership, notably made restrained but irate comments on "how this was all handled by the companies involved"—a not-so veiled indictment of Linux being left in the dark while other companies had months more time to develop patches.

Whether the release of information was premature by embargo standards or far too late for the industry to catch up to the tier 1 companies, there's still a mess to clean up. What now?

Intel had 79.3% of the CPU marketshare as of July 2017. Image courtesy of Daze Info.

The Philosophy of Moving Forward

Given how much processors rely on speculative execution for performance, Spectre and Meltdown are daunting problems. But, given how many unknowns are yet to be addressed, it's difficult to guess what happens next.

History tells us that the immediate next steps are likely to begin in the realm of operating system updates and echo through the next iterations of architecture design. We have a blueprint of this solution because this certainly has not been Intel’s first time managing a large-scale vulnerability.

In 1997, the “F00F” flaw was discovered in which a lock instruction would incorrectly perform bus cycles in locked mode, causing the processor to stop all activity until it was rebooted (AKA "halt and catch fire"). The wide-scale deployment of Intel processors, and the possibility of users losing unsaved data, made the problem significant.

But, as Greg Kroah-Hartman will tell you, Intel isn't the only company that needs to respond to Spectre and Meltdown. The industry as a whole is at a watershed moment when it comes to security.

So what is a cloud hosting company like Atlantic.net doing to manage the problem? Puranik says, “we’re in various stages, depending on the operating system. You have to be very methodical and systematic and look at each case and how you are going to handle the problem—you can’t just blanket [a solution] over the entire server farm." He adds that security is the first priority, followed by solutions to restore performance, echoing the F00F blueprint.

“I don’t think it’ll be a one-and-done patch," he says. "I think there will be two or three patches after. Right now, people are just trying to get their patches out, and there will probably be a performance hit because they’re going to be focused on security. I think after a few waves, there will be time to come up with ways to patch that doesn’t impact performance as badly as the first patches. My gut feeling is that we’ll be able to recover some of the performance that’s being lost but we just don’t know how much”.

Whatever the next steps are in dealing with Spectre and Meltdown, one result is likely—our perceptions of security are going to change, as will the way we design processors.

What will chip makers do in the future if they can't rely on speculative execution? Which alternative performance boosters will be researched as a result?

As Marty Puranik sees it, the changes will radiate beyond immediate fixes and redesigning architectures. The way companies do business will have to adjust, as well: “Any company that has a compliance officer is going to have more work and more checkboxes that vendors will have to check off—more demand on what your standardized response to these types of things are. So I think it’s going to create a lot more work. There’s definitely going to be more work.”