SSDs Vulnerable to Deliberate, Low-Level Data Corruption Attacks

Posted onMay 24, 2017May 24, 2017

Over the last decade, solid state drives (SSDs) have moved from extremely rare (and expensive) alternatives to hard drives to being the storage option of choice for enthusiasts and mobile users alike. An SSD is one of the best ways to improve the performance of an older system with a traditional hard drive, and costs have fallen below 50 cents per GB. But a new paper from Carnegie Mellon University, Seagate, and ETH Zurich has shown that the way data is programmed into MLC SSDs makes them vulnerable to data corruption attacks, meaningfully reducing the drive’s lifespan in the process. That’s significant, because MLC drives constitute the vast majority of SSDs in-market.

Before we dive into the findings, (PDF) let’s take a moment to define some terms. The first SSDs that were developed stored one bit of data per cell and are called single-level cells (SLCs). These devices are programmed in a single stage and are not vulnerable to the attack methods we’ll be discussing.

The next type of NAND stores two bits of information per cell and is referred to as multi-level cell, or MLC. The “level” refers to the number of voltage states within each cell. MLC devices used to use single-stage programming, but switched to a two-stage programming method below the 40nm process node (the overwhelming majority of 2D planar NAND drives are manufactured below 40nm these days).

Finally, there are triple-level cell SSDs, which store three bits of data in each cell. While this paper does not discuss whether TLC drives are vulnerable to these attack strategies, everything I could find on the TLC programming process suggests that they are, since TLC NAND uses a three-stage programming cycle.

NAND programming voltages, for SLC, MLC, and TLC.

One bit of good news, however, is that the attack we’re going to discuss does not work against current 3D NAND from Samsung and other manufacturers, yet. Current 3D NAND is single-shot programmed and built on older process nodes (40nm, in Samsung’s case). The study authors expect 3D NAND to return to two-stage programming as process node technology moves below 40nm once again in the future. Modern 2D planar NAND is built at a variety of nodes, defined somewhat differently by each manufacturer. One can assume 20nm as an overall approximation.

How the attack works

The reason that MLC NAND is programmed in a two-stage process below the 40nm process node is to reduce the chance that the voltage changes required to write data into one block of NAND will propagate into adjacent wordlines within the same block. This is known as cell-to-cell program interference and occurs because of parasitic capacitance coupling within the memory block. These two issues are one of the main reasons why 3D NAND became necessary — the smaller the process node the NAND is built on, the more difficult it is to prevent writes to one cell from corrupting data in other cells.

MLC NAND contains two bits of information, or pages — the least significant bit and the most significant bit (LSB and MSB). In the two-step process, the cell is initially set to a temporary state in which its threshold voltage is roughly half of the final value. This temporary LSB voltage is then read into a buffer within the flash chip, then programmed again, moving the threshold voltage to the final state expected for a fully programmed NAND cell. The SSD controller interleaves programming steps of any given cell with the programming steps of adjacent cells to minimize the chance for data corruption.

Here’s the problem. The data loaded into the LSB buffer for final MSB programming is loaded directly from the flash cell to be programmed, not the flash controller. This reduces latency and improves performance, since routing the work through the SSD controller would require transferring the data. But it also means that if the data loaded from the LSB buffer has errors, those errors definitionally can’t be corrected by the SSD controller — it never “sees” the data. The authors report that “the final cell voltage can be incorrectly set during MSB programming, permanently corrupting the LSB data” (emphasis original).

Rowhammer’s flash-based cousin

We’ve discussed Rowhammer before — the exploit that corrupts data in RAM by reading and writing to specific parts of DRAM in order to flip bits in target areas. The two-step programming process makes NAND vulnerable to a similar type of attack. Would-be attackers can either exploit cell-to-cell program interference, which exploits parasitic capacitance coupling to introduce errors in adjacent cells, or by using a technique known as read disturb, which repeatedly reads the same set of cells. This can create a weak programming effect that’s capable of flipping the bits of cells that aren’t even being read. Call it Cellhammer, if you like, but the results are the same — corrupted data and damaged SSDs.

These types of attacks can meaningfully reduce SSD lifespan by introducing additional errors, as shown in Figure 3, above. According to the research team, none of the various management techniques baked into SSDs are currently sufficient to prevent these attacks, though there are several ways that these attacks could be mitigated and one option that would prevent them altogether.

The most direct way to stop Cellhammer is to buffer LSB data directly in the SSD controller. This would prevent the attack from functioning by allowing the SSD controller to correct errors in the LSB data, rather than trusting that this data is accurate and contains no errors.

This would, however, cause a small increase in latency (the team estimates it at ~5 percent). That seems a modest price to pay in exchange for securing systems against this kind of attack. But it’s not clear if any NAND manufacturers would adopt these policies if they thought it might harm their competitive standing against the rest of the industry. Consumer and enterprise SSDs live and die on their ability to show best-in-class performance.

Then again, this kind of feature could find a home in enterprises, where companies are willing to pay for data security rather than just speed. The recent spate of ransomware attacks and the leak of the NSA’s various hacking tools could put new emphasis on ensuring systems remain secured against a wide variety of attack vectors, including low-level attacks like this one.