Wednesday, February 18, 2015

How to replace a failed harddisk in Linux software RAID

This guide shows how to remove a failed hard drive from a Linux RAID1
array (software RAID), and how to add a new hard disk to the RAID1
array without losing data. I will use gdisk to copy the partition
scheme, so it will work with large harddisks with GPT (GUID Partition
Table) too.

1 Preliminary Note

In this example I have two hard drives, /dev/sda and /dev/sdb, with the partitions /dev/sda1 and /dev/sda2 as well as /dev/sdb1 and /dev/sdb2.

/dev/sda1 and /dev/sdb1 make up the RAID1 array /dev/md0./dev/sda2 and /dev/sdb2 make up the RAID1 array /dev/md1.

/dev/sda1 + /dev/sdb1 = /dev/md0

/dev/sda2 + /dev/sdb2 = /dev/md1

/dev/sdb has failed, and we want to replace it.

2 How Do I Tell If A Hard Disk Has Failed?

If a disk has failed, you will probably find a lot of error messages in the log files, e.g. /var/log/messages or /var/log/syslog.
You can also run

cat /proc/mdstat

and instead of the string [UU] you will see [U_] if you have a degraded RAID1 array.

3 Removing The Failed Disk

To remove /dev/sdb, we will mark /dev/sdb1 and /dev/sdb2 as failed and remove them from their respective RAID arrays (/dev/md0 and /dev/md1).
First we mark /dev/sdb1 as failed:

and replace the old /dev/sdb hard drive with a new one (it
must have at least the same size as the old one - if it's only a few MB
smaller than the old one then rebuilding the arrays will fail).

4 Adding The New Hard Disk

After you have changed the hard disk /dev/sdb, boot the system.
The first thing we must do now is to create the exact same partitioning as on /dev/sda.
We can do this with the command sgdisk from the gdisk package. If you
havent installed gdisk yet, run this command to install it on Debian and
Ubuntu:

apt-get install gdisk

For RedHat based Linux distributions like CentOS use:

yum install gdisk

and for OpenSuSE use:

yast install gdisk

The next step is optional but recomended. To ensure that you have a
backup of the partition scheme, you can use sgdisk to write the
partition schemes of both disks into a file. I will store the backup in
the /root folder.