If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. Registration is $1 to post on this forum. To start viewing messages,
select the forum that you want to visit from the selection below.

I think it is obvious what happened. Same as happened to the Samsung 470. Once the unpowered data-retention time gets down to a few days, powering off the SSD for several days causes most of the flash to lose its charge. Since the SSD stores critical metadata on the flash, the SSD cannot operate once most of the flash has lost its charge.

And that isn't satisfying. Its more of a lament.

And to think there wasn't a single reallocation before the drive died. It may be a symptom of 25nm NAND... The cells can't hold a charge before they fail. Maybe 34/32nm NAND will die first before the gates get too warn to hold in the voltage.

...This data retention issue does however bring an element of frustration to the testing. Is a year unreasonable? How much of an impact does that requirement have on how the MWI is calculated? At what point should you stop writing and for how long should you then wait?....

Originally Posted by Christopher

Maybe something kept the M4 controller from reinitializing on power on? I find it hard to believe that the M4 just quit working after being powered off for a few days....

Originally Posted by johnw

...Once the unpowered data-retention time gets down to a few days, powering off the SSD for several days causes most of the flash to lose its charge. Since the SSD stores critical metadata on the flash, the SSD cannot operate once most of the flash has lost its charge.

You guys are on the right track here.
The flash manufacturers rate the PE cycle based on flash data retention and correctability given the amount of spare ECC area on each page and how much that can correct. So you can theoretically run 200%, 300% etc of the PE cycle, but you're gonna get more and more probability of errors or failed drives. The bit error probability skyrockets above the 10^(-15) error rate that most consumer drives quote at this point. When a IMFT 25nm drive starts out, it is likely well beyond a 10^(-20) unrecoverable bit error rate.
Throw in things like temperature variations, background radiation, etc, and you have a smattering of other error inducing mechanisms.
The reason its not booting up is likely that the last remaining firmware copy in your flash got corrupted beyond recoverability.

True enterprise SLC and MLC drives actually push more width and more layers of error correction mechanisms into the drive, which ultimately gives them longer data retention and lower unrecoverrable bit error rates than consumer drives, from infancy to EOL.

You guys are on the right track here.
The flash manufacturers rate the PE cycle based on flash data retention and correctability given the amount of spare ECC area on each page and how much that can correct. So you can theoretically run 200%, 300% etc of the PE cycle, but you're gonna get more and more probability of errors or failed drives. The bit error probability skyrockets above the 10^(-15) error rate that most consumer drives quote at this point. When a IMFT 25nm drive starts out, it is likely well beyond a 10^(-20) unrecoverable bit error rate.
Throw in things like temperature variations, background radiation, etc, and you have a smattering of other error inducing mechanisms.
The reason its not booting up is likely that the last remaining firmware copy in your flash got corrupted beyond recoverability.

True enterprise SLC and MLC drives actually push more width and more layers of error correction mechanisms into the drive, which ultimately gives them longer data retention and lower unrecoverrable bit error rates than consumer drives, from infancy to EOL.

Does that mean that the M4 would have continued writing had it not been powered off, and might have continued until the NAND started wearing completely out? The M4 didn't even have any reallocation events, but somehow on power-off I guess the cell charge dissipated. I think it's an artifact of 25nm NAND, that 3xnm NAND gets flagged as bad before the drive gets to the point where MLC trapped voltage "leaks out" of the weakened gates. Maybe I'm phrasing it wrong, but the M4 had 13,330PE cycles on it.

Originally Posted by B.A.T

I know. It's a sad thing to continue without it

I basically stole my M4 back from the family member I gave it to (giving them another drive in its stead). I'm thinking about OPing it and running it, but I'm not sure how many writes it has already. I would guess it's under 200GB.

I just upgraded it to 0009 (It had 0001FW, and yours had 0002 I think). I'm not sure how much value it would be to the test though. That thing was a beast, but once you get to 500TB you should probably start leaving it off for a couple days. There's no guarantee that a second drive would die the same death, but I'll put it out there.

I basically stole my M4 back from the family member I gave it to (giving them another drive in its stead). I'm thinking about OPing it and running it, but I'm not sure how many writes it has already. I would guess it's under 200GB.

I just upgraded it to 0009 (It had 0001FW, and yours had 0002 I think). I'm not sure how much value it would be to the test though. That thing was a beast, but once you get to 500TB you should probably start leaving it off for a couple days. There's no guarantee that a second drive would die the same death, but I'll put it out there.

A quick check of value 173 and multiply it with 64 020 803 584 byte and you will have your answer. I think the only impact fw ver have is the speed during the test. Most of the improvement from v 0001/0002/0009 was stability issues. It's a very good ssd and a little strange to continue without it. It's been in this test since 1th july and we need to rethink how we do the testing and to what purpose we are testing. Until now we have just been counting P/E cycles.

B1 has stopped decreasing, it is 2 up from last reading : from 62 to 64.

--

We are not going to change the test, the goal still is to find how long the drive can Endure writing, it is the main goal.

However, as a result of the Samsung and the m4 stopped working there is now a second goal, data retention.
It has been there all along and the m4 just confirms that we need to start looking at the phenomena.

The question is how do we validate the retention without making the test last for years and thus making the test academic and boring.

I am continuing as planned and will perform a short retention test at 500TiB on both drives, testing every 50 or 100TiB depending on speed might be what we need.

Instead of all drives going through rigorous retention testing we should start by selecting a drive (a fast drive e.g. the Samsung 470/830 or the m4) and perform more exact retention testing on that drive.
That way we can get to the core without breaking the main goal of the test.
The findings from this test can be used to specify how to perform a proper retention test.

If its not too much trouble could you post some benchmarks of the 830 128GB? ASU and CDM or whatever would be excellent. I've only seen the 256 and 512, and since Plextor just released a much faster version of its Marvell/toggle drives I'm a little torn between the two at 128gb.

I'm planning on doing some retention testing, but not until somewhere north of 600TB raw writes since the Mushkin is using 32nm NAND.

Highly unlikely. Since the firmware does not need to be written frequently, it can be stored on its own private flash that is used for nothing else.

In contrast, the metadata for the index of LBAs to flash pages is constantly changing, so it is likely written to regular flash (probably redundant copies, too) so that it can take advantage of wear leveling. If the index were written to private flash, it might wear out before the rest of the SSD's flash.

Although I can think of another strategy for writing the metadata, so I could be wrong about it being written to regular flash. The other strategy would be to devote a chunk of flash many times larger than the metadata. Then do a simple wear-leveling algorithm where each write of the metadata occurs in the next slot in the reserved flash. I'm not sure how much flash would need to be reserved, since it depends on how often the metadata gets written to flash.

It seems that the MWI might not be so overly conservative after all. 1 year to hold data without a charge after the MWI has expired seems excessive. I wonder how much the MWI would change if the requirement to hold data changed to 3 months as it is with enterprise.

Zads

Originally Posted by Hopalong X

It is rated at 1 year after hitting MWI-1.

How far was the M4 past the MWI-1?

It hit MWI-1 at 178TiB. It was up to around 600TiB of writes if memory serves.

Way past MWI 1. If it had been stopped at that point it should have held its data as speced.
After that holding data is questionable to length of time.
Same as asking how long it will continue to write data.

You are all making the new specs with the testing.

Do you know where I can find the spec. The M4 spec sheet says nothing.

Only reason I say that, is because you have the older 50nm Nand, which should have 10K PE rating. You have written 13K+ to the whole of the drive, thus you have exceeded the number of PE cycles that would be within the "guaranteed for 1 year retention" spec.

I have to believe that the VT will last much longer while retaining data than the M4. The M4 had over 4x its NANDs PE rating, while the the Turbo is only 30% past rated PE cycles. I'd be shocked if it didn't make it to 1 PB.

Incidentally, that is why I picked up a VTurbo 120 a few weeks ago.

My older Agility 60 died (at the most inopportune time) today as well, making it my first SSD failure. I was a little surprised as its not like it had a lot of writes on it, but I'm thinking it has more to do with the 1.7fw... I think I'm sticking with 1.6 on the other Indilinxes. I just needed it to not die until Thanksgiving. It will be a few days before I get back to the homestead to check it out, so I'm unable to determine whether it's a d-flash candidate or j.f.d.