Only when rebuilding. The chances of getting a read error on two disks at the same time on the same piece of data is extremely remote. Of course, when rebuilding, you only need the error on one disk... which is why I'm running raidz2 :)Reply

This came up on the last article about these drives as well. Currently I'm running six 3TB Reds in a RAID-6. Over the last year they've rebuilt once a month. I have Linux set to log any errors, and it's never logged a single block error. With the double parity, it seems like it would have logged one by now if the actual URE were that high on a per-bit basis. This unit was a replacement for a unit using nine drives, which did have two drive failures in three years, but both failures were announced by SMART before any actual failure occurred. The SANs at work do scrubs once a week, and kick out about 2% of drives a year. It really seems like reality is better than the specifications in this case.Reply

I have it set that way to spot a failing drive before it actually needs to be rebuilt. It's an orderly rebuild, checkarray, like this: http://www.thomas-krenn.com/en/wiki/Mdadm_checkarr...In the past it has been useful as it has caused drives to throw a SMART error for reallocates during the process (with some 1TB drives) allowing me to replace them proactively.Reply

Ganesh, could you get a response from Synology/asus/Qnap/etc as to why they don't have NAS with better CPU/RAM? Those products are nice, but many people wish to have them with more CPU/ram for media streaming. I find it very odd than none of the major players don't contribute to this market.

Why no 3ghz CPU and 16gig systems? It can't be cost since the ones that can be upgraded ram wise are cheap upgrades.

Out of the current popular NAS, only 2 support Transcoding, and multiple 1080p streams. But they have terrible software.Reply

It looks like they switched to this marketing on RE and SE. The terms are in black and white, but it is a deviation from a measurement scheme, and can only be construed as deceiving in my book. I love WD, but this pisses me off.Reply

For my home/home office use, the most important aspect by far for me is reliability/failure rates, mainly because I don't want to invest in more than 2 drives or go beyond Raid 1. I realize the most robust reliability info is based on several years of statistics in the field, but is their any kind of accelerated life test that Anandtech can do or get access to that has been proven to be a fairly reliable indicator of reliability/failure rates differences across the models? I'm aware of the manufacturer specs, but I don't regard those as objective or measured apples-apples across manufacturers.

If historical reliability data is fairly consistent across multiple drives across a manufacturers' line, perhaps at least provide that as a proxy for predicted actual reliability. Thanks for considering any of this!Reply

If you're really concerned about reliability, you need double-parity rather than RAID-1. In a double parity situation the system knows which drive is returning bad data. With RAID-1 is doesn't know which drive is right. Of course either way you should have a robust backup infrastructure in place if your data matters.Reply

If you speak about silent data corruption (ie: read error that aren't detected by the build in ECC), RAIDZ2 and _some_ RAID6 implementation should be able to isolate the wrond data. However this require that for every read both parity data (P and Q) are re-calculated, hammering the disk subsystem and the controller's CPU. RAIDZ2, by design, do this precise thing (N disks in a single RAIDZ array give you the same IOPS that a single disk), but many RAID6 implementation simply don't do that for performance reason (Linux MDRAID, for example, don't to it).

If you are referring to UREs, even a single parity scheme as RAID5 is sufficient for non-degraded conditions. The true problem is for the degraded scenario: in this case and on most hardware RAID implementation, a single URE error will kill your entire array (note: MDRAID behave differently and let you to recover _even_ from this scenario, albeit with some corrupted data and in different manner based on its version). In this case, a double parity scheme is a big advantage.

On the other hand, while mirroring is not 100% free from URE risk, it need to re-read the content of a _single_ disk, not the entire array. In other word, it is less exposed to URE problem then RAID5 simply because it has to read less data to recover from a failure (but with current capacities RAID6 is even less exposed to this kind of error).

MDRAID will recheck all of the parity during a scrub. That's really enough to catch silent data corruption before your backups go stale. It's not perfect but is a good balance between safety and performance. The problem with RAID5 compared to RAID6 is that in a URE situation the RAID5 will still silently produce garbage as it has no way to know if the data is correct. RAID6 is at least capable of spotting those read errors.Reply

I think you are confusing URE with silent dats corruption. An URE in non-degraded RAID 5 will not produce any data corruption, it will simply trigger a stripe reconstruction.

An URE in degraded RAID 5 will not produce any data corruption, but it will result in a "dead" or faulty array.

A regular scrub will prevents unexpected UREs, but if a drive suddenly become returning garbage, even regular scrubs can not do anything.

In theory, RAID6 can identify what drive is returning garbage because it has two different parity data. However, as stated above, that kind of control is avoided due to the severe performance penalties it implies.

RAIDZ2 take the safe path and do a complete control for every read, but as results its performance is quite low.

Most of the storage stuff I work with is bigger than the one Linux box, so I haven't dug deeply in to the exact details of that implementation. I do know I was bit in the past by a RAID5 with a bad disk in it turning the entire thing to trash. Thankfully some experimentation was able to determine which disk was actually returning garbage.

However I have not seen the stripe reconstruction count going up during monthly scrubs, so what I'm saying is that my experience is that actual URE counts are lower than the spec. The spec may be a worst case or something else like that.Reply

Umm, Raid 5 and 6 require a minimum of 3 discs, so no thanks. And yes, I'm aware that any form of Raid does not eliminate the need for a backup - all the more reason for finding out the best (most reliable at reasonable cost) disc for a 2 disc Raid 1 config - no way will I spend all of my budget on a Raid 5 or 6 and be forced to abandon a backup.Reply

While RAID1 is safe, I wouldn't advise it. The ST4000DM000 wasn't designed for RAID mode. I would just use FreeFileSync to automatically replicate the data from the ST4000DM000 to the NAS drive. This is what I do for my home NAS when using desktop drives.

It will give you peace of mind with your data, and I doubt you will be able to tax the single drive enough with streaming unless you are simultaneously doing large file transfers, but you could do those during non-critical hours to avoid stuttering.Reply

It seems that Western Digital has some internal competition, as the Red Pro pretty much overlaps the WD Se series. Even the WD datasheets show very similar features (UREs above all) and, to tell the truth, the WD Red Pro is rated for much more load/unload cycles (300K vs 600K).

I currently have two 8-drive Seagate 4TB drives (5900 RPM, first one that was publicly available) arrays in RAID6 and one in RAIDZ2 (same thing without the expensive raid card). I have no regrets. Had one drive fail and rebuilt the array in just a few hours. Performance is plenty for my home network and can saturate a dual 1G network setup easily. I do wish I had more drives at the time to choose from but if I was to do it now, 8TB!!! or maybe 6... I never have enough storage.Reply

On the other end of the spectrum, I have never used more than about 200 GB of any home computer. Unless you're producing huge quantities of your own content (recording HD video constantly or something), it's very hard to fill more persistent storage because each byte of stored data typically costs money (i.e. movie files, program files, music files, etc - usually these all cost money, so filling up large amounts of storage with them must mean spending large amounts of money).

So actually nothing important. When I see these NAS/Storage articles I just can't help with a lot of eye-rolling at what folks write. Folks spending large amounts of effort and money on data that is of little value to anyone or anything.

I best most here would actually get by with a 2TB external USB3 HDD.if they were honest. Oh and that includes the business they work for.Reply

Well, what you are calling unimportant is a very subjective statement. To me, spending large amounts of time, effort and money is extremely worth it. Also, much of it has paid off over time since I have properly implemented good practices. This has allowed me to share a library of movies, tv shows and music with my family in 5 different rooms via XBMC because it is all centrally located.

Having done all of that work to offer this content would be a total waste if I didn't take proper measures to ensure it is backed up. Surely, it wouldn't be smart to let an array fail and have to re-encode and format all of that data all over again; now that would take a lot of wasted time and effort.

Besides, a 2TB external HDD would have to be USB3 (possibly with UASP) and hooked to a USB3 controller to achieve decent, multiple streams, but it wouldn't have redundancy. Also, 2TB is only going to hold so many movies. What if you like 400-480p movies, but I prefer 720-1080p movies? All of this comes down to preference, and this alone is what determines each of our own setups to suit our tastes.Reply

It's very easy to legitimately consume more than 200GB on any computer let alone a server:- I believe in having multiple copies of personal docs and media I created: photos, personal videos, emails, installers, projects, etc- if you have multiple desktops and laptops, each should be backed up- and don't forget to backup the backup devices :-)

Do that for 30 years, and you'll have quite a collection. Only the inciminating evidence should be deleted on a regular basis.

Oh, and you'll only need that one file a few days after you delete it.Reply

When comparing NAS drives, reliability is by far the #1 concern. Power consumption and noise are also important, but by no means the deciding factor.

Testing the WD Red drives (especially given the pretty high failure rates of the plain WD Reds) without saying something about reliability makes the whole article pointless.IF the drives are reliable, people will chose them over faster, chepaer and probablyn even noisier drivers. If you're going to to do this you needs to test at least 10-20 drives and come up with some kind of torture test to really push them.Reply

This is exactly why I only buy hard drives with 5yr warranties. The length of a warranty tells you a lot about the confidence of a manufacture in their product. When bad luck does arise, (usually around the 3-5yr range), you have a drive that gets replaced with no questions asked. At least that is my experience with WD.Reply

Ganesh - can you put some questions out to the manufacturers and write a short article on the relevance of URE? I find it very hard to take this metric seriously when consumer drives are all marked as 'better than 1 in 10^14,' a nice round number that hasn't changed in a decade (forever?). Has there really been no improvement? Are they all really the same? And are enterprise drives precisely 10x better? Unlikely. And what is really different about them? Reply

1 post is not a "discussion". And while this may be relevant for Linux users, it doesn't help anyone else, perhaps running RAID on Windows or using a NAS, for which some at least of these drives are marketed.

In the last discussion on this here, there was mention that ZFS (at least with double parity) can avoid UREs affecting array rebuilds. However, it looks as if Storage Spaces and the new ReFS file system now in Windows 8.1 and Windows Server can achieve the same thing as ZFS with much lower system resources (although it is necessary to bypass the Storage Spaces UI on Windows 8.1 for proper configuration). Can anyone share any experience of using ReFS, since this is quite new and directly challenges the Nas4Free/FreeNas route which requires FreeBSD as an OS? In particular, it makes a single media storage server/HTPC combo a feasible proposition, which might be very useful for many...Reply

When used in single parity scheme, no RAID implementation or file system is immune to UREs that happen during rebuild. What ZFS can do it to catch when a disk suddenly return garbage, which with other filesystem normally result in silent data corruption.

But UREs are NOT silent corruption. They happen when the disk can not read the requested block and give you a "sorry, I can't read that" message.

They are if you are using WD Red drives, which Ganesh has previously said are using URE masking to play nicer with RAID controllers. They issue dummy data and no error instead of a URE. This, and the serious implications of it especially with single parity RAID (mirror/RAID5), is NOT mentioned in this comparative article, which is shocking.

To reiterate: if a RAID5 array (or a degraded RAID6) has a masked URE, there is no way to know which disk the error came from. And if the controller is NOT continuously checking parity against all reads for speed then the dummy data will be passed through without any error being raised at all. Worse, since you don't know there has been a read error, you will assume your data is OK to backup, so you will likely overwrite good old backups with corrupt data, since space for multiple copies is likely to be at a premium, so any backup mitigation strategy is screwed.

Given the fact that these are 4GB consumer class drives with 1 in 10^14 URE numbers, the chance of a URE when rebuilding is very high, which is why these Red drives are extremely unsafe in RAID implementations that do NOT check parity continuously. I already ran the numbers in a previous post, although they haven't been verified - Ganesh said he was seeking clarification from the manufacturers. Bottom line: caveat emptor if you risk your data to these drives, with or without RAID or a backup strategy.Reply

After all, I find extremely difficult to think that an hard drive will intentionally return bad data instead of a URE.

The only product range where I can _very remotely_ find a similar thing useful is with WD Purple (DVR) series: being often used as simple "video storage" in single disk configuration, masking an URE will not lead to big problems. However, the proper solution here is to implement a configurable SCTERC o TLRE.

A quick search back through previous WD Red drive reviews reveals nothing immediately. Ganesh ran a large article on Red firmware differences that covered configurable TLER behaviour, which is about dropping erroring drives out of an array quickly so that the array parity or other redundancy can take over and provide the data that the drive can't immediately retrieve, but nothing like this was mentioned.

However, in http://www.anandtech.com/show/6083/wd-introduces-r... the author Jason Inofuentes wrote: "They've also included error correction optimizations to prevent a drive from dropping out of a RAID array while it chases down a piece of corrupt data. The downside is that you might see an artifact on the screen briefly while streaming a movie, the upside is that you won't have playback pause for a few seconds, or for good depending on your configuration, while the drive drops off the RAID to fix the error."

That sounds like what Ganesh has said, although I can't see anything in his articles mentioning it. It may be a complete misunderstanding of the TLER behaviour, though. The problem with the behaviour described above is that it assumes that the data is not important, something that will only manifest as a little unnoticed corruption while watching a video file. But what if it happens while you're copying data to your backup array? What if it's not throwaway data, but critical data and you now have no idea that it's corrupt or unrecoverable on the disk so you NEED that last good backup you took... I don't think ANYONE is (or should be) as casual as that about the intrinsic VALUE of their data - why bother with parity/mirror RAID otherwise? If the statement is correct, it's extremely concerning. If not, it needs correcting urgently.Reply

To me that sounds like a short TLER setting. The description says nothing about if the drive returns an error or not. It may very well be the playback software receiving the error but continuing playback.Reply

But a short TLER is designed specifically to allow the array parity/redundancy to kick in immediately and provide the missing data by reconstruction. There wouldn't BE any bad data returned (unless there was no array redundancy). So as described this is NOT anything to do with short TLER. It is about the drive not returning an error when it can't read data successfully (ie. a URE), and issuing dummy data instead. The fundamental issue is that without an error being raised, neither the array hardware/software nor the user can take any action to remedy the data failure, whether that's restoring the bad data from backup or even highlighting the drive to see if this is a pattern indicative of likely failure.

There are some comments about it in that article which try to explain the scope (it seems to be limited to some ATA commands), but not in sufficient detail for me or most average users who don't know what ATA commands are sent by specific applications or the file system, and they certainly didn't answer my questions and misgivings.Reply

Yes, shodanshok is right ; TLER feature in these NAS drives is a shorter timeout rather than URE masking. Ian's quote of my exchange in a private e-mails was later clarified, but the conversation didn't get updated here:

1. When URE happens, the hard drive returns an error code back to the RAID controller (in the case of devices with software RAID, it sends the error back to the CPU). The error code can be used to gauge what exactly happened. A fairly detailed list can be found here: http://en.wikipedia.org/wiki/Key_Code_Qualifier : URE corresponds to a medium error with this key code description: "Medium Error - unrecovered read error"

2. Upon recognition of URE, it is up to the RAID controller to decide what needs to be done. Systems usually mark the sector as bad and try to remap it. It is then populate with data recovered using the other drives in the RAID array. It all depends on the vendor implementation. Since most off-the-shelf NAS vendors use mdadm, I think the behaviour will be similar for all of those.

3. TLER just refers to quicker return of error code back to controller rather than 'hanging' for a long time. The latter behaviour might cause the RAID controller to mark the whole disk as bad when we have URE for only one sector.Reply

Hi, are the bandwidths in graphs (page 5...) really supposed to be in Mbps (mega-bits per second)? Although it's correct bandwidth unit, the values seem to be really low (fastest tests would be about 30MB/s), the values provided I'd expect to be in MBps for the numbers to correspond...Reply

Hey. This test setup is wrong. There is on SAS disk but there is no SAS HBA in the list of test setup. according to other tests benchamarks HGST SAS disk is the fastest from this list but it suffers because of poor or very poor controller. this comparison is worth nothing without good SAS HBA. and remember good HBA also increase SATA disk performance. embedded intel controllers are very simple and limited performance. good SAS HBA is about 150$ so it is not a big deal. regardsReply

old article i never seen issues with data reliability unless the drive itself has been problematic (bits of the disk that cant be read and the stupid drive not remapping it) norm i just change the drive once issues happen or random issues as i had with some samsung drives not liking my server (fine in other systems)Reply