TechReport torture test piles on more than 700 terabytes before first failures.

Further Reading

How long, exactly, do SSDs last? It’s a difficult question to answer because estimating an SSD’s life requires taking a whole lot of factors into consideration—type and amount of NAND used in the drive, overall write amplification, read/write cycle, and more. When we did our in-depth examination of how SSDs work a couple of years back, we looked a bit at how those factors affect drive life, but TechReport is going even further than that and has been subjecting six drives to a long-term torture test to actually measure, rather than estimate, the drives’ service life.

The results are impressive: the consumer-grade SSDs tested all made it to at least 700TB of writes before failing. Three of the drives have written 1PB (that’s a thousand terabytes, by TechReport’s decimal reckoning, not 1024TB). That’s a hell of a lot more writes than the manufacturers’ stated drive lifetimes, and that’s good news for SSD-buying consumers.

Performing that many writes takes time—in fact, TechReport has been torturing the drives to death since last August. The six drives chosen to die for science are Corsair’s 240GB Neutron GTX (with 19nm MLC NAND), Intel’s 240GB 335 (with 20nm MLC NAND), two of Kingston’s 240GB HyperX 3K drives (with older 25nm MLC NAND), and two Samsung drives—one 256GB 840 Pro (with 21nm MLC NAND) and one 250GB 840 (with 21nm TLC NAND). The Intel and Kingston drives use SandForce controllers, the two Samsung drives use Samsung's own controllers, and the Corsair drive uses a controller from Link_A_Media Devices.

The tests to which the drives are being subjected are broad in scope, but they involve blasting the drives with incompressible writes (so that their controllers are limited in the advanced compression and de-duplication techniques they can apply to the incoming data). The two identical Kingston drives are also being used to test the difference in drive life brought about by an incompressible workload versus a workload that does allow for compression and deduplication.

An SSD with dedupe and compression receives a write. The controller looks at the data, figures out what parts are repeated elsewhere, and only stores the unique parts. Additionally, if the red or blue chunks were already on the drive, they'd be discarded, further reducing data to be written.

In an update posted yesterday morning, TechReport listed the first casualties: the Intel 335, the incompressible workload Kingston HyperX 3K, and the triple-level cell Samsung 840. Intel’s 335 died at around 700TB of writes, while the Kingston drive lasted to a bit over 725TB; the TLC NAND-equipped Samsung 840 gamely soldiered on to past 900TB before dying.

The failure modes of each drive were different as well. The Intel drive entered a self-destruct mode to prevent it from being used after it was no longer able to guarantee its ability to reliably write, while the Kingston HyperX 3K threw SMART errors and vanished after a reboot. The Samsung 840 simply died without warning. However, it’s worth repeating that the failures didn’t occur until the drives had passed the manufacturer’s stated lifetime write limits many times over—it took almost a year of solid torture-test writes to get them to failure.

The other three SSDs are still working and have made it past 1PB of writes, and TechReport’s torture test remains ongoing. However long the drives do end up lasting, it’s definitely long enough to put to rest a lot of fears about current-generation SSD lifetime limits.

Lee Hutchinson
Lee is the Senior Technology Editor at Ars and oversees gadget, automotive, IT, and culture content. He also knows stuff about enterprise storage, security, and manned space flight. Lee is based in Houston, TX. Emaillee.hutchinson@arstechnica.com//Twitter@Lee_Ars

It would be interesting to see the differences between the consumer and enterprise SSD's. I wonder what the real lifetime of those really expensive SAS SSD's are these days, and if its truly worth the big increase in cost.

So far I've had one SSD fail on me in a years time running the drives in a four drive RAID0 for the sake of performance. I use the drives for I/O testing. I've lost three spinning disks in two different arrays in this same time frame. We've only had the one SSD drive fail around the office.

The results are impressive: the consumer-grade SSDs tested all made it to at least 700TB of writes before failing. Three of the drives have written 1PB (that’s a thousand terabytes, by TechReport’s decimal reckoning, not 1024TB). That’s a hell of a lot more writes than the manufacturers’ stated drive lifetimes, and that’s good news for SSD-buying consumers.

It would be really nice to see what the manufacturers' stated drive lifetimes are, for each drive. Otherwise these are just big numbers floating in space.

I was scared of SSD's for a while; I had two die at work within my first year or so on the job. IT claims it was a defective batch. Regardless, it's definitely made me more of a proponent of remote backups.

The results are impressive: the consumer-grade SSDs tested all made it to at least 700TB of writes before failing. Three of the drives have written 1PB (that’s a thousand terabytes, by TechReport’s decimal reckoning, not 1024TB). That’s a hell of a lot more writes than the manufacturers’ stated drive lifetimes, and that’s good news for SSD-buying consumers.

It would be really nice to see what the manufacturers' stated drive lifetimes are, for each drive. Otherwise these are just big numbers floating in space.

The biggest issue is that the failures seem to be so catastrophic. The drives don't merely drop into read only mode; they tend to become entirely inaccessible. That's scary.

Well, at least they failed rather than trying to soldier on even though their data was corrupted (I'm assuming they were reading back the written data?). As such I'd have no fears of putting them into a RAID 5 or such. Stagger the old and the new so you're not likely to have 2 crash before the loss of one is noticed.

The biggest issue is that the failures seem to be so catastrophic. The drives don't merely drop into read only mode; they tend to become entirely inaccessible. That's scary.

It seems like the onboard diagnostics are capable of predicting failure (and the Intel's did), but I'm not sure how much that really matters. Most devices are going to die in other ways than raw NAND exhaustion since no one bulk copies maximum entropy data for 1+ year at a time onto drives anyway.

How does so many months of continuous torture testing equate to only 700 TB of data?

They've not be writing constantly. Previous articles have mentioned breaks for unpowered data retention tests and photographs of them stacked in as many different ways as they can think of (bey articles need pictures).

How does so many months of continuous torture testing equate to only 700 TB of data?

It does seem rather low. Given that they should have a write speed (quick google search) of around 500MB/s

SSD performance really starts to get slow once you write a lot of data to it. I'd imagine their drives are going slower than a platter based HDD by now.

I've got a 64GB SSD that's nearing the end of its life. Its usual write speed I see is 15MB/s. It started off around 150MB/s. (Edit: Yes, I have TRIM enabled on a Windows machine, yes I've tried to do a secure delete to completely clear it multiple times.)

2nd Edit: After reading the linked article, its interesting to see that they didn't experience any performance degradation after so many write cycles. Newer hard drives must have much more resilient flash memory on them or much better controller design. I know the newer SSD's I've purchased haven't really seen any change in performance, its just the really early models that I've personally experienced and seen others experience performance issues with drives reaching the end of their lives. This could be due to poor controller design, poor memory design, or other factors.

The biggest issue is that the failures seem to be so catastrophic. The drives don't merely drop into read only mode; they tend to become entirely inaccessible. That's scary.

To be fair, most consumer HDDs don't even give you the courtesy of dying quickly. They tend to corrupt your data for some time before finally kicking the bucket.

Anecdotally, most of the mechanical failures on the HDDs I've used have been total losses, with no possible way to recover data. So a fallback to a "read only mode" is no substitute for hot (and cold!) backups.

The biggest issue is that the failures seem to be so catastrophic. The drives don't merely drop into read only mode; they tend to become entirely inaccessible. That's scary.

And in the case of the Intel drive designed too on purpose.

Yet at the same time 2 of the SSD's did give warnings the Kingston HyperX 3K 3TB's of writes before failure while the Intel 50TB's of writes before failure while the Samsung designed for less than 200TB gave it's warning at 300TB's and lasted till over 800TB's.

The results are impressive: the consumer-grade SSDs tested all made it to at least 700TB of writes before failing. Three of the drives have written 1PB (that’s a thousand terabytes, by TechReport’s decimal reckoning, not 1024TB). That’s a hell of a lot more writes than the manufacturers’ stated drive lifetimes, and that’s good news for SSD-buying consumers.

It would be really nice to see what the manufacturers' stated drive lifetimes are, for each drive. Otherwise these are just big numbers floating in space.

There all rated by their manufacturers for less than 200TB's

Which of course doesn't necessarily mean much - most of them may be capable of significantly more, but setting the rating at 200TB means a larger number pass.

However, the could have also been lucky and got a bunch of drives in the 99th percentile for lifetime too. This doesn't say much when you only use one of each.

For personal, selfish reasons, I wish they had included a Crucial m500 in there. Specifically a 480GB or 1TB option. Regardless, this test makes me feel even better about switching to an SSD.

Still running regular, redundant backups, anyway.

Would have been nice to see how it fairs. I have loved mine for the last two years. I have always wondered how long it would last for consider my usage is primarily as an import location for my photos (which are dumped into it 80gb a weekend and almost entirely deleted down to 5-10gb of that, rinse lather repeat). Good to know that the minimum lifetime of these drives seems to be well above what I will ever be able to achieve before replacing it.

Still, as others have already stated, this provides even more incentive to regularly backup your drives. It looks like when an SSD eats it, it is most likely taking all of the data with it.

How does so many months of continuous torture testing equate to only 700 TB of data?

It does seem rather low. Given that they should have a write speed (quick google search) of around 500MB/s

SSD performance really starts to get slow once you write a lot of data to it. I'd imagine their drives are going slower than a platter based HDD by now.

I've got a 64GB SSD that's nearing the end of its life. Its usual write speed I see is 15MB/s. It started off around 150MB/s.

They did performance tests at every 100TB mark no performance decrease but they also do a 7 day zero power retention test after every 100TB's they also have to delete all the date written as the drives are 240/256GB SSD's/

These tests are entertaining but don't really tell us much about SSD reliability. You'd need a sample size in the hundreds to thousands of drives. Right now we just know that a handful of drives from a variety of manufacturers, cherry picked for (at least) not suffering from early failure significantly exceeded their rated life spans.

I know Google, Amazon, Microsoft, Facebook etc must have massive numbers of SSDs and they must be tracking their failures. I would love to see those numbers. They'll probably be limited to enterprise grade SSD's but I think they'd be quite interesting regardless.

Out of curiousity, does anyone have a good way to estimate the total amount of memory/swap written on a linux machine?

With numbers like that, it's actually quite reasonable to use a SSD for swap.

Memory is so cheap these days that swap is practically a niche application. The kind of person who is penny pinching on memory so much that they're forced to use swap regularly is not the kind of person who installs a SSD.

The other potential application is people with absolutely freaking huge datasets that can't be streamed, but those people generally can afford to buy mass amounts of RAM anyway because swapping out makes the already slow processing take forever.

Is there an assumption that this scale for large SSDs? e.g. writing 10/100PB to a 1TB drive?

Within the same family, probably. In most cases all drives in a family will use the same type of flash for all the drives in it, just using more in the higher capacity models. There've been some cases where an SSD vendor swapped flash type in mid production run without changing the branding; most of those were a few years ago though. Sites like Anandtech raking them over the coals for doing so (because of the major performance changes doing so often caused) they've mostly stopped doing so. I think there've been a few cases where the lowest capacity drives in a family used lower density flash from the start to boost performance by using more dies; in these cases a capacity doubling wouldn't extend the drive lifetime.

A failure of the controller, etc would be independent of the number of flash dies used though; so there're some failure modes that wouldn't scale with capacity. There's also a risk that since many consumer SSDs are factory rated for the same number of GB worth of writes that they might put better binned flash on low capacity drives that need more writes/die to hit the number than higher capacity drives. The latter is purely speculation on my part though; I've not seen anything from an authoritative site about it one way or the other.

Out of curiousity, does anyone have a good way to estimate the total amount of memory/swap written on a linux machine?

With numbers like that, it's actually quite reasonable to use a SSD for swap.

Memory is so cheap these days that swap is practically a niche application. The kind of person who is penny pinching on memory so much that they're forced to use swap regularly is not the kind of person who installs a SSD.

The other potential application is people with absolutely freaking huge datasets that can't be streamed, but those people generally can afford to buy mass amounts of RAM anyway because swapping out makes the already slow processing take forever.

The second part is false. Academic departments never have enough money for computer memory, and keeping shared resources running in OOM conditions is a real challenge.

-- server admin for academic department

edit to add: In general, the argument "this is no longer a problem because things are cheaper now" never really hold water. There's always someone who has big needs and small resources. Maybe 1TB of SSD is cheaper now than it was 3 years ago, but it's still too expensive for someone somewhere who could use it. Maybe hard drives are faster now than they were a decade ago, but storage is still slow.

One of them has lasted over 2 years of almost continual use.The other one has failed 3 times (technically two drives which failed twice each as one of them was a replacement).

I know some colleagues who have gone through over 5 SSD drives in a single year (in different machines) and other colleagues who are running the same drive for over 4 years without any issues.

The one lesson I have learnt .. make sure you have backups, and check that those backups work! Something you should _always_ do, but something I am especially paranoid about now I am using SSDs every day.

So far I've had one SSD fail on me in a years time running the drives in a four drive RAID0 for the sake of performance. I use the drives for I/O testing. I've lost three spinning disks in two different arrays in this same time frame. We've only had the one SSD drive fail around the office.

The biggest issue is that the failures seem to be so catastrophic. The drives don't merely drop into read only mode; they tend to become entirely inaccessible. That's scary.

This makes me wonder if the test itself might be flawed.

As you point out all the failures seem to be catastrophic in nature which is odd.

Techreport are ignoring the SSD's warnings and are running them till they die that's the point, there rated for less than 200TB's and send warnings from 300TB's and are all still running at 700TB's with 3 of 6 still running at 1000TB's.

This is heartening, but it's not the same as longitudinal testing. Crash-writing an SSD until it fails isn't the same as testing durability over time. In some ways, it's more intense, but at the same time it doesn't simulate real-world use-- the degradation of materials due to prolonged heat exposure, physical shock, dust buildup.

The hard drive that writes 700 terabytes inside of one prolonged lab session may fail well short of that number over a couple years of less intensive use. SSD's are getting more and more reliable. Maybe the ones available today are more reliable than contemporary HD's, but we won't know for sure until we see the longevity of both in the field.

This is not a failure of this test-- like they said in their outline, a more realistic is too slow to feasibly enact.