The Linux RAID-1, 4, 5 Code

A description of the implementation of the RAID-1, RAID-4 and RAID-5 personalities of the MD device driver in the Linux kernel, providing users with high performance and reliable, secondary-storage capability using software.

Using RAID (Redundant Array of
Inexpensive Disks) is a popular way of improving system I/O
performance and reliability. There are different levels of disk
arrays that cover the whole range of possibilities for improving
system I/O performance and increased reliability.

This report describes the current implementation of the RAID
driver in the kernel, as well as the changes we made to the kernel
to support new disk-array configurations that provide higher
reliability.

The Multiple Devices (MD) Driver

The MD driver is used to group together a collection of block
devices into a single, larger block device. Usually, a set of SCSI
and IDE devices are configured into a single MD device. As found in
the Linux 2.0 kernel, it is designed to re-map sector/device tuples
into new sector/devices tuples in two different modes
(personalities): linear (append mode) and striping (RAID-0
mode).

Linear mode is just a way of concatenating the contents of
two smaller block devices into a larger device. This can be used to
join together several small disks to create a larger disk. The size
of the new disk is the sum of the smaller ones. For example,
suppose we have two disks with 300 sectors each; after we configure
them as linear MD devices, we have a new MD device that has 600
sectors: the sectors 0 to 299 of the device are mapped to the first
disk and the sectors 300 to 599 are mapped to the second
disk.

RAID-0 mode (also known as striping) is more interesting.
This mode of operation writes the information to the device while
distributing the information over the disks that are part of the
disk array. Unlike linear mode, this is not just a concatenation of
the disk-array components; striping balances the I/O load among the
disks resulting in a high throughput. This is the personality
chosen by most people who want speed.

Figure 1 shows how four disks are arranged in this mode.
Shadowed regions are those that provide redundant information, and
those stacked-up disks represent a single disk. As you can see
there are no shadowed regions in the figure. What does this mean?
Well, it means that if there is a hardware problem in any of the
elements of the disk array, you lose all of your
information.

Both the linear and the striping personalities lack any
redundancy and error recovery modes. If any of the elements of the
disk array fail, the contents of the complete MD device are
useless, and there is little hope that any useful information can
be recovered. This is similar to what happens with regular
secondary storage devices—if it fails, you lose your information.
However, with RAID-0, you have a higher risk of losing your
information than with a regular disk. The higher failure rate is
due to the fact that you have more disks and a failure in any of
the disks make the RAID-0 contents unusable.

If you have a good backup strategy and you don't mind losing
a day of work if any of your disks fail, using RAID-0 may be the
best thing to do. For example, RAID-0 is used for newsgroups like
comp.unix, but a higher reliability RAID level is used for
important newsgroups like alt.binaries.pictures.furniture.

The way these two personalities are supported by the MD
driver in the kernel is quite simple; the low level
ll_rw_blk routine is responsible
for putting block driver I/O requests on the system-request queue.
This routine must be modified to call a mapping function that is
part of the MD driver and is invoked whenever a request is issued
for a block on a MD device.

Block re-mapping happens just before the input/output request
is put into the system-request queue. This re-mapping function is
quite simple. It is invoked with pointers to the device and to the
block number, and all it does is change the device ID and the block
number. The device ID is changed to point to one of the disks in
the disk array, and the block number is changed to point to the
proper location on that disk. Basically it is a nice hack (but it
uses a couple of “ifdefs”, which we all know our fearless leader
does not like).

Comment viewing options

Actually my question is how the error handing is done in RAID and how we can check it means the different ways to check how the error handling is done?
If i am making two virtual devices and making RAID1 using these devices and writing some data and then corrupting the data on 1st disk.Then how error handing is done and is there any way to check how it is done and similarly with RAID5????

I have been using Linux MD RAID-1 for some time now and have been satisfied with its performance. I've lost two drives in this time and I feel that the simple addition of a software mirror was well worth it!

I am about to try RAID-5 in a few minutes and this article has left me feeling comfortable that I know what my kernel is doing. Thanks guys!

Are you sure the drives are not dead because of miss-mapping by the md_map? I don't think this would cause any crashing, but if it is a member in a system drive array then I would think there might be a possibility of corruption and loss of md-status. I dont really know any of this, but it sure does look smart from this angle.
social cos(90)