News and my experience working with GNU/Linux and open source softwares.

Monday, June 6, 2005

RAID - Revisited

I'm still researching about RAID. I have done software RAID before and am looking for some hints on hardware RAID and the advantageous over software-based RAID. Let's get some basic picture of RAID and beyond.

What does RAID stands for ?

In 1987, Patterson, Gibson and Katz at the University of California Berkeley, published a paper entitled "A Case for Redundant Arrays of Inexpensive Disks (RAID)" . This paper described various types of disk arrays, referred to by the acronym RAID. The basic idea of RAID was to combine multiple small, inexpensive disk drives into an array of disk drives which yields performance exceeding that of a Single Large Expensive Drive (SLED). Additionally, this array of drives appears to the computer as a single logical storage unit or drive.

The Mean Time Between Failure (MTBF) of the array will be equal to the MTBF of an individual drive, divided by the number of drives in the array. Because of this, the MTBF of an array of drives would be too low for many application requirements. However, disk arrays can be made fault-tolerant by redundantly storing information in various ways. Five types of array architectures, RAID-1 through RAID-5, were defined by the Berkeley paper, each providing disk fault-tolerance and each offering different trade-offs in features and performance. In addition to these five redundant array architectures, it has become popular to refer to a non-redundant array of disk drives as a RAID-0 array.

The different RAID levels

RAID-0

RAID Level 0 is not redundant, hence does not truly fit the "RAID" acronym. In level 0, data is split across drives, resulting in higher data throughput. Since no redundant information is stored, performance is very good, but the failure of any disk in the array results in data loss. This level is commonly referred to as striping.

Two disksThree disksRAID-1

RAID Level 1 provides redundancy by writing all data to two or more drives. The performance of a level 1 array tends to be faster on reads and slower on writes compared to a single drive, but if either drive fails, no data is lost. This is a good entry-level redundant system, since only two drives are required; however, since one drive is used to store a duplicate of the data, the cost per megabyte is high. This level is commonly referred to as mirroring.

RAID-1 (mirroring)

RAID-2

RAID Level 2, which uses Hamming error correction codes, is intended for use with drives which do not have built-in error detection. All SCSI drives support built-in error detection, so this level is of little use when using SCSI drives.

RAID-3

RAID Level 3 stripes data at a byte level across several drives, with parity stored on one drive. It is otherwise similar to level 4. Byte-level striping requires hardware support for efficient use.

RAID-4

RAID Level 4 stripes data at a block level across several drives, with parity stored on one drive. The parity information allows recovery from the failure of any single drive. The performance of a level 4 array is very good for reads (the same as level 0). Writes, however, require that parity data be updated each time. This slows small random writes, in particular, though large writes or sequential writes are fairly fast. Because only one drive in the array stores redundant data, the cost per megabyte of a level 4 array can be fairly low.

RAID-5

RAID Level 5 is similar to level 4, but distributes parity among the drives. This can speed small writes in multiprocessing systems, since the parity disk does not become a bottleneck. Because parity data must be skipped on each drive during reads, however, the performance for reads tends to be considerably lower than a level 4 array. The cost per megabyte is the same as for level 4.

Summary:

RAID-0 is the fastest and most efficient array type but offers no fault-tolerance.RAID-1 is the array of choice for performance-critical, fault-tolerant environments.In addition, RAID-1 is the only choice for fault-tolerance if no more than two drives are desired.RAID-2 is seldom used today since ECC is embedded in almost all modern disk drives.RAID-3 can be used in data intensive or single-user environments which access long sequential records to speed up data transfer. However, RAID-3 does not allow multiple I/O operations to be overlapped and requires synchronized-spindle drives in order to avoid performance degradation with short records.RAID-4 offers no advantages over RAID-5 and does not support multiple simultaneous write operations.RAID-5 is the best choice in multi-user environments which are not write performance sensitive. However, at least three, and more typically five drives are required for RAID-5 arrays.

Hardware RAID

The hardware based system manages the RAID subsystem independently from the host and presents to the host only a single disk per RAID array. This way the host doesn't have to be aware of the RAID subsystems(s). The controller based hardware solutionDPT's SCSI controllers are a good example for a controller based RAID solution.The intelligent contoller manages the RAID subsystem independently from the host. The advantage over an external SCSI---SCSI RAID subsystem is that the contoller is able to span the RAID subsystem over multiple SCSI channels and and by this remove the limiting factor external RAID solutions have: The transfer rate over the SCSI bus.

The external hardware solution (SCSI---SCSI RAID)

An external RAID box moves all RAID handling "intelligence" into a contoller that is sitting in the external disk subsystem. The whole subsystem is connected to the host via a normal SCSI controller and apears to the host as a single or multiple disks.This solution has drawbacks compared to the contoller based solution: The single SCSI channel used in this solution creates a bottleneck. Newer technologies like Fiber Channel can ease this problem, especially if they allow to trunk multiple channels into a Storage Area Network.4 SCSI drives can already completely flood a parallel SCSI bus, since the average transfer size is around 4KB and the command transfer overhead - which is even in Ultra SCSI still done asynchonously - takes most of the bus time.

Software RAID (aka poor man's redundancy)

The MD driver in the Linux kernel is an example of a RAID solution that is completely hardware independent.The Linux MD driver supports currently RAID levels 0/1/4/5 + linear mode. Adaptecs AAA-RAID controllers are another example, they have no RAID functionality whatsoever on the controller, they depend on external drivers to provide all external RAID functionality. They are basically only multiple single AHA2940 controllers which have been integrated on one card. Linux detects them as AHA2940 and treats them accordingly.Every OS needs its own special driver for this type of RAID solution, this is error prone and not very compatible.

Hardware vs. Software RAID

Just like any other application, software-based arrays occupy host system memory, consume CPU cycles and are operating system dependent. By contending with other applications that are running concurrently for host CPU cycles and memory, software-based arrays degrade overall server performance. Also, unlike hardware-based arrays, the performance of a software-based array is directly dependent on server CPU performance and load.

Except for the array functionality, hardware-based RAID schemes have very little in common with software-based implementations. Since the host CPU can execute user applications while the array adapter's processor simultaneously executes the array functions, the result is true hardware multi-tasking. Hardware arrays also do not occupy any host system memory, nor are they operating system dependent.

Hardware arrays are also highly fault tolerant. Since the array logic is based in hardware, software is NOT required to boot. Some software arrays, however, will fail to boot if the boot drive in the array fails. For example, an array implemented in software can only be functional when the array software has been read from the disks and is memory-resident. What happens if the server can't load the array software because the disk that contains the fault tolerant software has failed? Software-based implementations commonly require a separate boot drive, which is NOT included in the array.