The Agony and Ecstasy of RAID

Back when I was a novice service tech and barely knew anything about system administration, one of the few topics that we were always expected to know cold was RAID - Redundant Array of Inexpensive Disks.

It was the answer to all of our storage woes. With RAID we could scale our filesystems larger, get better throughput and even add redundancy allowing us to survive the loss of a disk, which, especially in those days, happened pretty regularly.

With the rise of NAS and SAN storage appliances the skill set of getting down to the physical storage level and tweaking it to meet the needs of the system in question are rapidly disappearing. This is not a good thing. Just because we are offloading storage to external devices does not change the fact that we need to fundamentally understand our storage and configure it to meet the specific needs of our systems.

A misconception that seems to have entered the field over the last five to ten years is the belief that RAID somehow represents a system backup. It does not. RAID is a form of fault tolerance.

Backup and fault tolerance are very different conceptually. Backup is designed to allow you to recover after a disaster has occurred. Fault tolerance is designed to lessen the chance of disaster. Think of fault tolerance as building a fence at the top of a cliff and backup as building a hospital at the bottom of it. You never really want to be in a situation without a both a fence and a hospital, but they are definitely different things.

Once we are implementing RAID for our drives, whether locally attached or on a remote appliance like SAN, we have four key RAID solutions from which to choose today for business: RAID 1 (mirroring), RAID 5 (striping with parity), RAID 6 (striping with double parity) and RAID 10 (mirroring with striping.)

There are others, like RAID 0, that only should be used in rare circumstances when you really understand your drive subsystem needs. RAID 50 and 51 are used as well but far less commonly and are not nearly as effective. Ten years ago RAID 1 and RAID 5 were common, but today we have more options.

Looking at RAID Options

Let's step through the options and discuss some basic numbers. In our examples we’ll use “n” to represent the number of drives in our array and we will use “s” to represent the size of any individual drive. Using these we can express the usable storage space of an array, making comparisons easy in terms of storage capacity.

RAID 1
In this RAID type, drives are mirrored. You have two drives and they do everything together at the same time, hence "mirroring." Mirroring is extremely stable as the process is so simple, but it requires you to purchase twice as many drives as you would need if you were not using RAID at all, as your second drive is dedicated to redundancy.

The benefit is that you have the assurance that every bit that you write to disk is being written twice for your protection. So with RAID 1 our capacity is calculated to be (n*s/2). RAID 1 suffers from providing minimal performance gains over non-RAID drives. Write speeds are equivalent to a non-RAID system while read speeds are almost twice as fast in most situations, since during read operations the drives can access in parallel to increase throughput. RAID 1 is limited to two drive sets.

RAID 5
Striping with Single Parity, in this RAID type data is written in a complex stripe across all drives in the array with a distributed parity block that exists across all of the drives. By doing this RAID 5 is able to use an arbitrarily sized array of three or more disks and only loses the storage capacity equivalent to a single disk to parity. But the parity is distributed and does not exist solely on any one physical disk.

RAID 5 is often used because of its cost effectiveness, due to its lack of storage capacity loss in large arrays. Unlike mirroring, striping with parity requires that a calculation be performed for each write stripe across the disks and this creates some overhead. Therefore the throughput is not always an obvious calculation and is dependent heavily upon the computational power of the system doing the parity calculation.

Calculating RAID 5 capacity is quite easy: it is simply ((n-1)*s). A RAID 5 array can survive the loss of any single disk in the array.

RAID 6
Redundant Striping with Double Parity. RAID 6 is practically identical to RAID 5 but uses two parity blocks per stripe rather than one to allow for additional protection against disk failure.

RAID 6 is a newer member of the RAID family. It was added several years after the other levels had become standardized. RAID 6 is special in that it allows for the failure of any two drives within an array without suffering data loss. But to accommodate the additional level of redundancy a RAID 6 array loses the storage capacity of the equivalent to two drives in the array and requires a minimum of four drives. We can calculate the capacity of a RAID 6 array with ((n-2)*s).

Many vendors use the term RAID 10 (or RAID 1+0) when speaking of only two drives in an array but technically that is RAID 1, as striping cannot occur until there are a minimum of four drives in the array. With RAID 10, drives must be added in pairs so only an even number of drives can exist in an array.

RAID 10 can survive the loss of up to half of the total set of drives but a maximum loss of one from each pair. RAID 10 does not involve a parity calculation, giving it a performance advantage over RAID 5 or RAID 6 and requiring less computational power to drive the array. RAID 10 delivers the greatest read performance of any common RAID type as all drives in the array can be used simultaneously in read operations. But its write performance is much lower. RAID 10's capacity calculation is identical to that of RAID 1, (n*s/2).

In today's enterprise it is rare for an IT department to have a serious need to consider any drive configuration outside of the four mentioned here, regardless of whether software or hardware RAID is being implemented. Traditionally the largest concern in a RAID array decision was based around usable capacity. This was because drives were expensive and small.

Today drives are so large that storage capacity is rarely an issue, at least not like it was just a few years ago, and the costs have fallen such that purchasing additional drives necessary for better drive redundancy is generally of minor concern. When capacity is at a premium RAID 5 is a popular choice because it loses the least storage capacity compared to other array types. And in large arrays the storage loss is nominal.

Today we generally have other concerns, primarily data safety and performance. Spending a little extra to ensure data protection should be an obvious choice. RAID 5 suffers from being able to lose only a single drive. In an array of just three members this is only slightly more dangerous than the protection offered by RAID 1.

We could survive the loss of any one out of three drives. Not too scary compared to losing either of two drives. But what about a large array, say, sixteen drives? Being able to safely lose only one of sixteen drives should make us question our reliability a little more thoroughly.

This is where RAID 6 stepped in to fill the gap. RAID 6, when used in a large array, introduces a very small loss of storage capacity and performance while providing the assurance of being able to lose any two drives. Proponents of the striping with parity camp will often quote these numbers to assuage management that RAID 5/6 can provide adequate "bang for the buck" in storage subsystems. But there are other factors at play.

Almost entirely overlooked in discussions of RAID reliability – an all too seldom discussed topic as it is – is the question of parity computation reliability.

With RAID 1 or RAID 10 there is no "calculation" done to create a stripe with parity. Data is simply written in a stable manner. When a drive fails its partner picks up the load and drive performance is slightly degraded until the partner is replaced. There is no rebuilding process that impacts existing drive members. Not so with parity stripes.

RAID arrays with parity have operations that involve calculating what is and what should be on the drives. While this calculation is very simple it provides an opportunity for things to go wrong.

An array control that fails with RAID 1 or RAID 10 could, in theory, write bad data over the contents of the drives but there is no process by which the controller makes drive changes on its own. So this is extremely unlikely to ever occur, as there is never a "rebuild" process except in creating a mirror.

When arrays with parity perform a rebuild operation they perform a complex process by which they step through the entire contents of the array and write missing data back to the replaced drive. In and of itself this is relatively simple and should be no cause for worry.

What I and others have seen first hand is a slightly different scenario involving disks that have lost connectivity due to loose connectors to the array. Drives can commonly "shake" loose over time as they sit in a server, especially after several years of service in an always-on system.

What can happen, in extreme scenarios, is that good data on drives can be overwritten by bad parity data when an array controller believes that one or more drives have failed in succession and been brought back online for rebuild. In this case the drives themselves have not failed and there is no data loss. All that is required is that the drives be reseated, in theory.

On hot swap systems the management of drive rebuilding is often automatic, based on the removal and replacement of a failed drive. So this process of losing and replacing a drive may occur without any human intervention – and a rebuilding process can begin. During this process the drive system is at risk and should this same event occur again the drive array may, based upon the status of the drives, begin striping bad data across the drives overwriting the good filesystem.

It is one of the most depressing sights for a server administrator to see when a system with no failed drives loses an entire array due to an unnecessary rebuild operation.

In theory this type of situation should not occur and safeguards are in place to protect against it. But the determination of a low level drive controller as to the status of a drive currently and previously and the quality of the data residing upon that drive is not as simple as it may seem and it is possible for mistakes to occur.

While this situation is unlikely it does happen and it adds a nearly impossible to calculate risk to RAID 5 and RAID 6 systems. We must consider the risk of parity failure in addition to the traditional risk calculated from the number of drive losses that an array can survive out of a pool. As drives become more reliable the significance of the parity failure risk event becomes greater.

Additionally, RAID 5 and RAID 6 parity introduces system overhead due to parity calculation, which is often handled by way of dedicated RAID hardware. This calculation introduces latency into the drive subsystem that varies dramatically by implementation both in hardware and in software. This makes it impossible to state performance numbers of RAID levels against one another as each implementation will be unique.

Possibly the biggest problem with RAID choices today is that the ease with which metrics for storage efficiency and drive loss survivability can be obtained mask the big picture of reliability and performance as those statistics are almost entirely unavailable. One of the dangers of metrics is that people will focus upon factors that can be easily measured and ignore those that cannot be easy measured regardless of their potential for impact.

While all modern RAID levels have their place it is critical that they be considered within context and with an understanding as to the entire scope of the risks. We should work hard to shift our industry from a default of RAID 5 to a default of RAID 10. Drives are cheap and data loss is expensive.