Blog:

Protecting your data: the broken drives edition

Posted by Alexei Rodriguez on 25 Sep 2012

Posted by Alexei Rodriguez on 25 Sep 2012

In our blog post “Evernote’s Three Laws of Data Protection”, Phil touches on some of the measures we take to protect your data and our goal of being a trusted place for it. There is much more we do, so I wanted to talk a bit about an important aspect: what happens when hard drives fail?

You have probably read stories of people buying previously owned computers and finding they contain all sorts of information from the previous owner, sometimes including customer data. It is for this reason why at Evernote we take the decommissioning of these drives very seriously.

At Evernote we use both hard drives and solid state drives (aka SSD) as part of the infrastructure that stores user data. Hard drives are mechanical in nature and thus, as with all things containing moving parts, they will eventually fail. SSDs have a different failure mode: there is a finite number of times data can be written to the memory banks in each SSD after which they become read only. We use hardware RAID controllers and replication to provide redundancy for all our storage devices. This means that when a disk fails your data is safe. Furthermore, we take proactive steps to identify drives that may be failing by monitoring the media errors and predictive failure statistics the drives provide. If a hard drive reaches a threshold (# of media errors or predictive failure counter), we will replace the drive before it fails (usually within the same or next business day). Sometimes drives simply fail without warning; it is our policy to replace these drives ASAP.

The end result of all this being broken drives which may contain user data. The ATA instruction set has a Secure Erase function which will write over every track on the drive, thereby making it nearly impossible to recover the data. This is great but it requires a working drive. In our case, most of our failed drives are non-functional, so we cannot make use of this feature.

Drives are expensive and generally include a good warranty (three years or more). Manufacturers usually require that the customer return the failed hard drives in order to receive a replacement as part of the warranty program. Since our failed drives may contain user data and we cannot use the secure erase tools, we can’t send a drive in for repair/replacement and risk the data, no matter how damaged the drive may be. Thankfully most drive manufacturers offer a “Black Hole” replacement program for this very case. The specifics vary across manufacturers, but it generally requires that the customer send in the faceplate of the drive and some form or written statement attesting to the destruction of the drive.

Our overall approach to handling failed drives is to destroy them. However, we take a “belt & suspenders” approach to this process.

The National Institute of Standards and Technology (NIST) is a US government agency whose mission includes developing and publishing guidelines for other US agencies regarding technology issues. These guidelines are available online (free of charge) and are generally accepted as industry standards. The NIST publication 800-88 (“Guidelines for Media Sanitization”) covers both physical and electronic records. Our approach is based on this publication.

We handle failed drives as follows:

Failed drives are kept in a secure location pending destruction.

The face plate of the drives is removed (Figure #1, #2, #3); this requires a few different Torx bits (the iFixit kits are handy for this).

The drive is then wiped using a degausser (Garner Products HD-2). Degaussing basically means “to de-magnetize”; this is intended to blank the drive (Figure #4).

The drive is then physically destroyed using a device which bends/breaks the drive by driving a wedge into the drive (Garner Products PD-4, Figure #5 & #6). This renders the drive inoperable (Figure #7 & #8).

The broken drive parts are then sent off for recycling.

The drive faceplates are then sent to the respective manufacturers and the replacement drives are put into the spares pool (Figure #9).

The goal of this is to make sure that no user data is EVER at risk due to unsafe handling of the drives. This approach is in line with the guidelines from NIST and other standards bodies, which means these are tried and true methods.

The guidance on SSD disposal is still evolving, but you are correct that published research suggests that degaussing is not 100% effective. Thankfully the nature of failed SSDs is such that the device itself is still able to be accessed and the secure wipe instructions applied. As for the destruction: the machine does bend discs with platters, but it does have a bit more of a destructive effect on the SSD, as there is not the same resistance. We are doing our own testing of the different failure modes / wiping methods in order to get to a fool proof method. Some have suggested using a microwave oven (remains to be tested). Ideally we could have a modified wood chipper, but the data center did not approve it. We have also looked at having on-site shredder visits, but have not found any suitable local providers; they all want to take the drives and shred them at their offices.

SSD drives are very easy to destroy, and there are many decent ways to do it.

Method 1: Pull the cover, ground the SSD’s negative power lead and apply multiple ultra-high-voltage sparks to the NVRAM chips using a high-voltage spark generator.

There are lots of commercially available spark generators which will work perfectly for this. Some even run on 120 volts, are trigger operated and include full isolation so no gloves or external power supply are needed.

I can hear the buzzzzz right now!

Allow the spark to go through the center of each NVRAM chip’s case. This will totally destroy the chip.

There will be NO accessible information left on the SSD.

Note: Ventilate the area with a low velocity fan to disapate any O-zone vapors.

Total time required, in a production environment, well under 5 minutes per drive.

Method 2: Purchase a small electric ceramics oven (kiln). Place multiple SSD’s as tightly packed as you like, into the kiln and heat it into a 1,200 Celsius 30 minutes cycle. All cooked. No more data – ever.

Method 3: Open the SSD and solder #22, 600 volt rated stranded leads directly to the + and Ground rails feeding the NVRAM chips. (This bypasses the over-voltage protection of the 5-volt power supply)

In an electrically isolated environment, connect the Ground lead to AC Neutral, and the + lead to the AC hot lead that is in series with a conventional 120 volt, 100 watt incandescent light bulb. This circuit should float (neither side connected to earth ground) and the power should be fed through a double-pole switch fed from a GFI outlet or circuit with a GFI circuit breaker installed.

Place the device to be burned out under or in a protective cover (i.e. a plexiglass box) on a concrete floor. This prevents any scintillation from hitting the operator.

Apply the power for 30 seconds. Allow to cool for 60 seconds, and there will be no more data.

A very small AC welder can also be used.

Method 4: Buy an old 20-ton stamp press, place an SSD drive between the jaws with the lower jaw flat, and the upper jaw fitted with a carbide tipped pin-grid array, and wearing the appropriate eye-wear depress the activator.

Bah-Dah-Boom, no more data.

This simple approach is fool proof, as it will penetrate and crack the case of each NVRAM chip, breaking the ceramic wafer inside.

——————

Any of these methods can be automated. Methods 1, and 3 can be performed in seconds, quite possibly without having to open the SSD case.

Method 4 also is very, very fast, and does not require case removal or special handling. You will have to sweep the floor afterwards however.

Write me if you’d like a tested prototype developed for you! I can even have it certified!

Why not simply encrypt the content on the drives? That way you don’t need to destroy anything: the data on the disk is inaccessible when the drive is separated from the servers.

You would have to manage distribution of the encryption key(s), but that’s not a big problem – there are many examples of doing this in a mostly secure way just a short google search away. And the security of the key doesn’t have to be overly excessive. One only has to ensure it’s not published outside of the network, or stored unencrypted onto a drive. (If someone hacks into the network to steal the key, then the data is already at risk anyway).

As for performance: all modern CPUs contain hardware support for encryption functions, and all modern OS’s have support for encrypted filesystems that leverage the CPU. The net result is that you can trivially encrypt the content of disks, with minimal performance overhead.

Chris, good question. It is my understanding of the Linux software based full disk encryption (FDE) that once put in place you are then prompted for the passphrase at boot time. While this may be fine for individual use, it does not scale to hundreds / thousands of systems. It is a solvable problem, but non-trivial.

Encryption at the raid controller layer is another option which is interesting but we have not thoroughly tested yet.

You will require a password (or some key) when you load the encrypted filesystem. Obviously there is no need for the root filesystem to be encrypted.

And there’s a trivial way to get the key at that point: put it on a cheap, tiny usb key, which is left permanently plugged into the server. At boot, read the password from the USB key, and then use it to mount the encrypted filesystem.

The point is not to protect the key from exposure should someone access the system. Indeed, at that point the data is already compromised, since the filesystem is mounted. The point is to protect the data after the disk is removed, which this does safely. Just don’t dispose of the USB key AND the disk together. 😉

[Of course, we can get more complicated and avoid the USB key disposal problem by having a keyserver. One solution would be a REDIS instance, where the key for each machine can be supplied (once the new machine has its network interface up).]

The point is that it’s not difficult to get this working, saves a lot of hassle and expense dealing with disk disposal, and is a lot more foolproof from a security perspective.

Thank you for protecting our data on the hard drives. The Garner equipment is not cheap – when you expand you may want to consider a couple of cheaper (yet perhaps more fun) alternatives:
http://www.harborfreight.com/20-ton-shop-press-32879.html
http://www.harborfreight.com/120-volt-spot-welder-61205.html

Thanks Brian. We actually started with a shop press as v1 (aka “Stampy”). However, our data center providers were not fans of the flying bits of metal. The welder is a fun option but is not practical at the data center.