Manage a Linux RAID 10 Storage Server

Part 3: Learn how to monitor, maintain, and make changes in a Linux RAID 10 array.

Today we'll learn how to monitor, maintain, and make changes in our RAID 10 array. We'll make it bigger, smaller, safely test failure recovery, and set up monitoring and failure notifications.

In part 2 of this series we learned how to create a RAID 10 array during a clean, new Kubuntu installation. The same method works with all the *buntus, Debian and CentOS 5.1. However, there is an even easier way — the Fedora 8 Anaconda installer supports RAID 10, and it recognizes existing RAID and LVM volumes without having to resort to the hacks we used last week. In fact the Fedora 8 graphical installer is sleek and fast; in my opinion the best of the batch. The graphical installers in CentOS 5.1 and Debian Lenny tie for second place, and Lenny's installer includes buttons for taking screenshots.

Another way to create a RAID array is to not have a root filesystem on the system at all, but to boot from a USB device, or even netboot. If you stuff enough RAM in the box you can do away with swap, and have 100 percent of your array devoted to data storage. However you set it up, it's best to keep your root filesystem out of both RAID and LVM for easier management and recovery.

Linux RAID and Hardware

I've seen a lot of confusion about Linux RAID, so let's clear that up. Linux software RAID has nothing to do with hardware RAID controllers. You don't need an add-on controller, and you don't need the onboard controllers that come on most motherboards. In fact, the lower-end PCI controllers and virtually all the onboard controllers are not true hardware controllers at all, but software-assisted, or fake RAID. There is no advantage to using these, and many disadvantages. If you have these, make sure they are disabled.

Ordinary PC motherboards support up to six SATA drives, and PCI SATA controllers provide an easy way to add more. Don't forget to scale up your power and cooling as you add drives.

If you're using PATA disks, only use one per IDE controller. If you have both a master/slave on a single IDE controller, performance will suffer and any failure risks bringing down both the controller and the second disk.

GRUB Follies

GRUB Legacy's (v. 0.9x) lack of support for RAID is why we have to jump through hoops just to boot the darned thing. Beware your Linux's default boot configuration, because GRUB must be installed to the MBRs of at least the first two drives in your RAID1 array, assuming you want it to boot when there is a drive failure. Most likely your Linux installer only installs it to the MBR of the drive that is first in the BIOS order, so you'll need to manually install it on a secondary disk.

First open the GRUB command shell. This example installs it to /dev/sdb, which GRUB sees as hd1 because it is the second disk on the system:

Creating and Testing New Arrays

You may want to have a hot spare. This is a partitioned, formatted hard disk that is connected but unused until an active drive fails, then mdadm (if it is running in daemon mode, see the Monitoring section) automatically replaces the failed drive with the hot spare. This example includes one hot spare:

The "personalities" line tells you what RAID levels the kernel supports. In this example you see two separate arrays: md1 and md0, that are both active, their names and BIOS order, and the size and RAID type of each one. 2/2 means two of two devices are in use, and UU means two up devices.

You can get detailed information on individual arrays:

# mdadm --detail /dev/md0

Is this partition part of a RAID array? This displays the contents of the md superblock, which marks it as a member of a RAID array:

# mdadm --examine /dev/hda1

You can also use wildcards, like mdadm --examine /dev/hda*.

Monitoring

mdadm itself can run in daemon mode and send you email when an active disk fails, when a spare fails, or when it detects a degraded array. Degraded means a new array that has not yet been populated with all of its disks, or an array with a failed disk:

# mdadm --monitor --scan --mail=me@here.net --delay=2400 /dev/md0

Your distribution may start the mdadm daemon automatically, so you won't need to run this command. Kubuntu controls it with /etc/init.d/mdadm, /etc/default/mdadm, and /etc/mdadm/mdadm.conf, so all you need to do is add your email address to /etc/mdadm/mdadm.conf.

Starting, Stopping, and Deleting RAID

Your Linux distribution should start your arrays automatically at boot, and mdadm starts them at creation. This command starts an array manually:

# mdadm -A /dev/md0

This command stops it:

# mdadm --stop /dev/md0

You'll need to unmount all filesystems on the array before you can stop it.

To remove devices from an array, they must first be failed. You can fail a healthy device manually:

# mdadm /dev/md1 --fail /dev/sda2 --remove /dev/sda2

If you're removing a healthy device and want to use it for something else, or just want to wipe everything out and start over, you have to zero out the superblock on each device or it will continue to think it belongs to a RAID array:

# mdadm --zero-superblock /dev/sda2

Adding Devices

You can add disks to a live array with this command:

# mdadm /dev/md1 --add /dev/scd2

This will take some time to rebuild, just like when you create a new array.

That wraps up our whirlwind tour of RAID 10 and mdadm. Come back next week to learn how to manage LVM volumes, which you can use anywhere and not just on RAID arrays, and on using smartctl to monitor hard disk health and warn you of impending failures.