Testing Endurance

We've mentioned in the past that NAND endurance is not an issue for client workloads. While Intel's SSD 335 moves to 20nm MLC NAND, the NAND itself is still still rated at the same 3,000 P/E cycles as Intel's 25nm MLC NAND. Usually we can't do any long-term endurance testing on SSDs for the initial review because it simply takes way too long to wear out an SSD. Even if you're constantly writing to a drive, it will take weeks, possibly even months for the drive to wear out. Fortunately Intel reports total NAND writes and percentage of lifespan remaining as SMART values that can be read using the Intel SSD Toolbox. The variables we want to pay attention to are the E9 and F9 SMART values, which represent the Media Wearout Indicator (MWI) and total NAND writes. Using those values, we can estimate the long-term endurance of an SSD without weeks of testing. Here is what the SMART data looked like before I started our endurance test:

This screenshot was taken after all our regular tests had been run, hence there are already some writes to the drive, although nothing substantial. What surprised me was that the MWI was already at 92, even though I had only written 1.2TB to the NAND. Remember that the MWI begins at 100 and then decreases down to 1 as the drive uses up its program/erase cycles. Even after it has hit 1, it's likely the drive can still withstand additional write/erase cycles thanks to MLC NAND typically behaving better than the worst-case estimates.

We've never received an Intel SSD sample that started with such a low MWI, indicating either a firmware bug or extensive in-house testing before the drive was sent to us.

To write as much as possible to the drive before the NDA lift, I first filled the drive with incompressible data and then proceeded with incompressible 4KB random writes at queue depth of 32. SandForce does real-time data compression and deduplication, so using incompressible random data was the best way to write a lot of data to NAND in a short period of time. I ran the tests in about 10-hour blocks, here is the SMART data after 11 hours of writing:

I had written another ~3.8TB to the NAND in just 11 hours but what's shocking is that the MWI had dropped from 92 to 91. With the SSD 330, Anand wrote 7.6TB to the NAND and the MWI stayed at 100, and that was a 60GB model; our SSD 335 is 240GB and thus it should be more durable (more NAND to write to). It's certainly possible that the MWI was at the edge of 92 and 91 after Intel's in-house testing, but I decided to run more tests to see if that was the case. Let's fast-forward 105 hours that I spent writing to the drive in total:

In a few days, I managed to write a total of 37.8TB to the NAND and during that time, the MWI had dropped from 92 to 79. In other words, I used up 13% of the drive's available P/E cycles. This is far from being good news. Based on the data I gathered, the MWI would hit 0 after around 250TB of NAND writes, which translates to less than 1,000 P/E cycles.

I showed Intel my findings and they were as shocked as I was. The drive had undergone their validation before shipping and nothing out of the ordinary was found. Intel confirmed that the NAND in SSD 335 should indeed be 3,000 P/E cycles, so my findings contradicted with that data by a fairly significant margin. Intel hadn't seen anything like this and asked me to send the drive back for additional testing. We'll be getting a new SSD 335 sample to see if we can replicate the issue.

It's understandable that the endurance of 20nm NAND may be slightly lower compared to 25nm even though they are both rated at 3,000 P/E cycles (Intel does have 25nm with 5,000 cycles as well) because 25nm is now a mature process whereas 20nm is very new. Remember that the P/E cycle rating is the minimum the NAND must withstand; in reality it can be much more durable as we saw with the SSD 330 (based on our tests its NAND was good for at least 6,000 P/E cycles). Hence both 20nm and 25nm MLC NAND can be rated at 3,000 cycles, although their endrudance in real world may vary (but both should still last for at least 3,000 cycles).

It's too early to conclude much based on our sample size of one. There's always the chance that our drive was defective or subject to a firmware bug. We'll be updating this section once we get a new drive in house for additional testing.

I'm disappointed that there hasn't been any more information on that 840 Pro that died.

Anand should really post some more details. Like what it was doing just before it died, the symptoms of how it failed, whether the SMART parameters could still be read, etc.

Also, Anand should be hounding Samsung to get back to him about it, if they haven't already. The 840 Pro is apparently shipping on Nov 6. If Samsung has not been able to diagnose the problem and report back by then, it looks bad for Samsung.Reply

Anand was filling the drive with sequential data (preconditioning it for our enterprise tests) and it just died in the middle of the run. After that it was no longer recognized in BIOS, not even when connected using USB to SATA adapter.

As far as I know, Samsung has not gotten back to us about it yet but let me ask Anand and see if he knows more.Reply

I am highly interested in what Samsung has to say about the failure. It seems to me that anandtech should be able to put some pressure on Samsung to give them a thorough failure analysis in a timely manner, or else anandtech will report that Samsung was unable to explain the failure and that looks bad for Samsung.Reply

The flash on SSDs arent going to get more reliable. ECC basically scales exponentially as the process dimensions keep shrinking. As the lines get closer and closer, the number of electrons holding the charge becomes harder and harder to measure. Each cell is 2 bits, so 4 different amounts of electrons need to be measured. Errors occur more frequently and get fixed. And that's not going to make the NAND any faster by going smaller. SSD speed/reliability improvements will/have come at the controller level. If you truly want a reliable SSD, go 34nm SLC. Its still being produced.Reply

Endurance will play a larger role in differentiating future SSD's as the industry continues to move to smaller NAND geometries.

I'm interested in seeing OCZ's Vector which will use 20nm MLC NAND. It'll be a big test for OCZ to see how their endurance technology stacks up against the competition (or lack thereof in the consumer space).Reply

This is false: "Based on the data I gathered, the MWI would hit 0 after around 250TB of NAND writes, which translates to less than 1,000 P/E cycles."

There is a forum that exclusively tests SSD endurance and Intel drives last far past the MWI of 0. In fact, after it reached zero, it started counting up. I remember the original X25-M lasting until the second MWI is significantly greater than 25(it could be 50 I don't remember).

They thought that after the Media Wearout Indicator reached 0, the drive would die. In fact, none of the drives did. NONE.

Even 240TB is hell of a lot. My X25-M has 7.6TB written to it and I had it since the year the drive was announced. At this rate, I'll be 30 years older by the time it reaches that point. So its a needless worry about nothing.

Contrary to Platter HDDs, which die off slowly and more and more data gets corrupted and gets slower and slower until you notice that the drive is dying. Less than 5 years for lot of people around me too.Reply

I did not say the drive will die after the MWI hits 1. In fact, I said the opposite:

"Even after it has hit 1, it's likely the drive can still withstand additional write/erase cycles thanks to MLC NAND typically behaving better than the worst-case estimates."

The problem here isn't that 1,000 P/E cycles isn't enough for a consumer, but the fact that there seems to be a huge difference in endurance between 20nm MLC and 25nm MLC if our data is correct. Intel claimed that there is no difference, both are 3,000 P/E cycles, but our data contradicts with theirs. Given that the SSD 335 doesn't bring any immediate price cuts, you are getting a worse product for the same money compared to the SSD 330.

It's of course possible that there is a simple firmware bug which reports wrong MWI or NAND writes, but at least so far Intel has not said anything to suggest that.Reply