Introduction

Scenario:
A drive has failed in your linux RAID1 configuration and you need to replace it.

Solution:
Use mdadm to fail the drive partition(s) and remove it from the RAID array.
Physically replace the drive in the system.
Create the same partition table on the new drive that existed on the old drive.
Add the drive partition(s) back into the RAID array.

In this example I have two drives named /dev/sdi and /dev/sdj. Each drive has 3 partitions and each partition is configured into a RAID1 array denoted by md#. We will assume that /dev/sdi has failed and needs to be replaced.
Note that in Linux Software RAID you can create RAID arrays by mirroring partitions and not entire disks.

Steps
(4 total)

1

Identify the faulty drive and array

Identify which RAID arrays have failed:

To identify if a RAID array has failed, look at the string containing [UU]. Each "U" represents a healthy partition in the RAID array. If you see [UU] the array is healthy. If you see a missing "U" like [U_] then the RAID array is degraded or faulty.

From the above output we can see that RAID arrays md0, md1, and md2 are missing a "U" and are degraded or faulty.

2

Remove the failed partition(s) and drive

Before we can physically remove the hard drive from the system we must first "fail" the disk partition(s) from all RAID arrays to which the failed drive belongs. In our example, /dev/sdi is a member of all three RAID arrays, but even if only one RAID array had failed we must still fail the drive for all three arrays before we remove it.

Now you can power off the system and physically replace the defective drive:
# shutdown -h now

3

Adding the new disk to the RAID arrays

Now that the new hard drive is installed we can add it to the RAID array. In order to use the new drive we must create the exact same partition table structure that was on the old drive. We can use the existing drive and mirror its partition table structure to the new drive. There is an easy command to do this:

# sfdisk -d /dev/sdj | sfdisk /dev/sdi

Note that sometimes when removing drives and replacing them the drive device name may change. For our example here we will make sure the drive we replaced is listed as /dev/sdi by issuing the command "fdisk -l /dev/sdi" and verifying that no partitions exist.

Now that the partitions are configured on the newly installed hard drive, we can add the partitions to the RAID arrays.

# mdadm --manage /dev/md0 --add /dev/sdi1
mdadm: added /dev/sdi1

Repeat this command for each partition changing /dev/md# and /dev/sdi#:

# mdadm --manage /dev/md1 --add /dev/sdi2
mdadm: added /dev/sdi2

# mdadm --manage /dev/md2 --add /dev/sdi3
mdadm: added /dev/sdi3

Now we can check that the partitions are being synchronized by issuing:

Once all drives have synchronized your RAID array will be back to normal.

4

Install GRUB to the new hard drive MBR

We need to install GRUB (short for GNU GRand Unified Bootloader) on the new drive's MBR (short for Master Boot Record). This is so that if the other drive fails the new drive will be able to boot into the operating system.

this is great ,
I am currently setting up an OpenSuse 12.3 file server as a test
and want to use soft Raid 1 for system disk. so was looking on how to test my raid once it is up.
I have build a VM and add 2 vmdk to it and setup all on that.
than just unplug on of the disks from VM and got the missing [U_]
but when I added the new disk it stayed the same. I was missing the part of rebuilding array :-)

A vendor claims Ciscos hardware routers would out perform Sophos firewalls as gateway devices. Each vendor claims their stuff is superior. What do you think, is the hardware advantage enough to overcome the extra hop/processing cost?