"As the modern go-to technology for Tier 0 enterprise storage, SSDs house business’s most critical data and are core to generating revenue. Split-second access to this data is important, but if that data should become corrupted, the most rapid access in the world won’t matter."

The Z68's SSD caching is called Intel Smart Response. The SSD caches reads (and writes in high performance mode) to a raid array. You need at least to hard drives for a raid array.

Are you sure about that? From what I heard a single HDD will do, no need for an array. That said I don't think much of SRT, too much complexity and reliance on driver support.

I read that you need one HDD and one SSD... but you have to select a configuration option titled RAID to configure it to work.

Can you share more of your thoughts on your caution... and perhaps what experiences have contributed to your caution.

I am struggling with the need for speed vs a Seagate whitepaper cautioning that data is more easily subject to corruption over time on an SSD. Given how flash memory works, as well as my own experience with SSDs, that caution sort of makes sense to me. SRT seems to represent an optimal solution to this problem.

Here's a nice review on SRT at Anandtech. Tradeoffs:- Z68 and small SSD = can overclock, use SRT, and Quick Sync (the latter via Virtu). You may not have any apps that make use of Quick Sync. Z68 boards are pricey. - P67 and larger SSD = can overclock. Mobo cheaper than Z68. Don't have to muddle with Virtu. Can load all apps on SSD and media on HDD. - H67 and larger SSD = no overclocking, has Quick Sync (via Virtu). Mobo cheaper than P67. Can load all apps on SSD and media on HDD.

1. It's basically an ad piece written by Seagate. Even if they have some SSDs on the market, they have a vested interest in making mechanical hard drives look better. That alone makes the article very suspect.

2. The article claims to be about "Data Integrity", but it is actually about Data Availability. Not very credible if they don't even get the terminology correct.

The article mentions data integrity in passing, but all the failure modes given are data availability problems.

Yes, flash drives require more error correction as time goes on. So do mechanical hard drives. So does everything else in a modern computer. Computer buses, Ethernet, storage links, etc, they all have more error detection/correction as things get faster/bigger. It's the nature of the beast.

Simple example: PCI express has much greater error detection then old PCI. Where are the articles about PCIe data corruption? None, because it's a non-issue. It was designed with the appropriate level of error detection. Same with SSDs. As long as the error detection/correction is appropriate for the task at hand then there is no issue.

The reason you can't find much about the issue is because again it is a non-issue. There were similar articles in the past about how mechanical drives would reach their limit and be unusable because of errors. They were wrong then and this article about SSDs is wrong now.

Yes, flash drives require more error correction as time goes on. So do mechanical hard drives. So does everything else in a modern computer. Computer buses, Ethernet, storage links, etc, they all have more error detection/correction as things get faster/bigger. It's the nature of the beast.

You may be right, but somehow I don't think it is so simple.

"I know from the emails I get that many readers think that once they've looked at the single issue of flash endurance - they've covered covered the bases for enterprise SSDs. While endurance remains a challenge for each new flash SSD generation - it's only a single one of many dimensions in the SSD life mix. That's why (in 2008) StorageSearch.com started this directory of definitive technology articles to help guide readers through the reliability maze. "http://www.storagesearch.com/ssd-data-art.html

With Computer buses, Ethernet and similar situations, all you have to do is detect the corruption and then you resend the data to correct it. With silent SSD corruption, there is no good way to correct the corruption even if you can identify it. The original data is now gone. SSD technology has some inherently unique cross-cell data corruption problems.

2. SSD "Silent Data Corruption""CRC can only identify that an error has occurred. It cannot correct it, but it does prevent "silent data corruption."""Currently the only way to validate that the drive does not have this silent corruption problem is with explicit testing. One method would be to run a test where the LBA itself and an incremental value are stored in that LBA. The host test system would ensure that the data returned contains the correct LBA and incremental value. If the wrong value is returned the host would know the drive failed the test.

This problem can also occur if there is a power failure while the LBA is being updated, the table may contain only information for the data that is now stale. To prevent errors like this, designers can use on-board capacitors (SuperCaps) to provide temporary power during a failure so that the buffers can be flushed and the LBA table properly updated.

Lastly, there is the issue of silent corruption due to firmware bugs. The only way to guard against firmware bugs is through extensive directed testing so that products are known to be bug-free before being put into production."http://www.storagesearch.com/sandforce-art1.html

3. Enterprise Storage Forum

" During programming, a cell will create a field large enough to perhaps change the properties of a neighboring cell. While designers go to great lengths to make sure that this doesn't happen or that the SSD can compensate for this happening, this phenomenon still occurs and can lead to silent data corruption.

The silent data corruption scenario is fairly simple – a cell has some voltage applied to it, a bias voltage or a program/erase voltage. The resulting field can disturb neighboring cells changing their properties. For example, the number of electrons in the cell can decrease or electrons can be tunneled into the cell. In either case, it's possible to change the value in the cell silently."

"However, it is important to realize that you can get data corruption on the SSD at some point."

"Reducing lithography size brings the cells closer together, reducing the distance between the source and the drain. This allows more cells in a given space, hopefully reducing costs and allowing SSDs to have larger capacities. However, the one fundamental aspect that does not change with lithography size is the voltages that are applied to the cells. The 5V bias voltage has to be applied to bring the cells to a conductive state, you still need 0.5V to read a cell, and 20V to program/erase a cell. However, existing data corruption problems may actually get much worse because the EM fields are the same size and will be stronger in neighboring cells as they are closer together. This only makes the problem of possible data corruption worse"

Spent a bit of time pondering about this for my build. The issue *may* be exacerbated for SSDs, but its by no means absent in HDDs.

There's three mitigation paths (which can be used concurrently) -

* Use of ZFS with RAID-Z (admittedly a small audience). There's a few other esoteric file systems that does "live" bit level correction. This corrects for a flipped media bit transparently. Maybe we'll see more of this when Windows 8 server comes round (new FS coming).

* Incremental backup. Assuming you keep a long enough backup chain, even if a bit is flipped and backed up as flipped, you could go back to a previous unflipped version. Trick is you may not know whether the current version has flipped bits in it...

* Protect high priority files with checksum/hash/PAR2. Hashes and the like will detect changes. PAR2 is actually a FEC that can unflip media errors. Pain to maintain, unless the majority of files on the system are static, or you can integrate the checksum generation into the application level (ie save file = save file to FS, then create new checksum)

You can also be careful and prevent most data corruption problems. It's not only bit-flip that's an issue.a) KISSb) test your hardware and software extensivelyc) use ECC RAMd) use battery-backed controllers (or enough caps as Intel put in the SSD controller of the 320) and/or a decent UPS

I have thought about it, but I thought you have to get server motherboards to get that. Are there any popular enthusiast Asus or Gigabyte motherboards that support ECC. Do the consumer Intel CPUs even support ECC?

I have thought about it, but I thought you have to get server motherboards to get that. Are there any popular enthusiast Asus or Gigabyte motherboards that support ECC. Do the consumer Intel CPUs even support ECC?

Note: Only referring to current SB CPUs.

None of the consumer Intel CPUs combined with a consumer chipset support ECC. The low end consumer CPUs (Celeron, Pentium, i3) support ECC when combined with a server (C200 series) chipset. None of the high end (i5, i7) CPUs support ECC in any case.

Basically the low end CPUs that don't have a Xeon equivalent support ECC for cheap servers.

None of the consumer Intel CPUs combined with a consumer chipset support ECC. The low end consumer CPUs (Celeron, Pentium, i3) support ECC when combined with a server (C200 series) chipset. None of the high end (i5, i7) CPUs support ECC in any case. Basically the low end CPUs that don't have a Xeon equivalent support ECC for cheap servers.

Probably even then, them memory isn't sold through typical consumer channels I would think.

Probably even then, them memory isn't sold through typical consumer channels I would think.

A lot of places don't stock it, but it isn't really hard to find.

ces wrote:

How much benefit is there to the ECC memory?

Google released some statistics on RAM errors a couple of years ago. The conditions their servers run under might make the results quite different than for a desktop machine, but it seems likely that errors are more common than people assume.

This is one of the reasons I stick to AMD - all their CPUs cupport ECC (unbuffered only, in the case of non-server chips), and a large proportion of motherboards for them do too.

Probably even then, them memory isn't sold through typical consumer channels I would think.How much benefit is there to the ECC memory?

ECC isn't too hard to find if you need it.

For mission critical stuff I would always use it. All our critical servers where I work use ECC and I would never use anything else.

That being said, I think a lot of ECC "studies" are scaremongering. I remember reading a study from the mid 90s (I think from IBM) that if it held true would mean modern computers would have memory errors every 2-3 minutes. If that were true a modern desktop with with 4-16 GB of RAM would crash before your OS was even installed.

I haven't seen these scaremongering studies. What I've seen led me to believe that the numbers of errors aren't a function of the amount of RAM you've got but more of the amount of chips (or something like that) and that the error distribution is far from random as you seem to assume with most error being caused by bad DIMMs. You could therefore have relatively high average number of errors with most users very rarely experiencing one (and most likely not seeing it when it happens).That also led me to believe you could avoid the worst pitfalls of non-ECC RAM for non-critical computers by regular and rigorous testing. But the way I see it, it's usually simpler and better to get ECC RAM if you're that worried.

Who is online

Users browsing this forum: No registered users and 5 guests

You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot post attachments in this forum