My hobby…

Recovering a RAID5 mdadm array with two failed devices

UpdateBefore reading this article you should know that it is now quite old and there is a better method – ‘mdadm –assemble –force’ (it may have been there all along). This will try to assemble the array by marking previously failed drives as good. From the man page:

If mdadm cannot find enough working devices to start the array, but can find some devices that are recorded as having failed, then it will mark those devices as working so that the array can be started.

I would however strongly suggest that you first disconnect the drive that failed first. If you need to discover which device failed first, or assemble doesn’t work and you need to manually recreate the array, then read on.

I found myself in an interesting situation with my parents home server today (Ubuntu 10.04). Hardware wise it’s not the best setup – two of the drives are in an external enclose connected with eSATA cables. I did encourage Dad to buy a proper enclosure, but was unsuccessful. This is a demonstration of why eSATA is a very bad idea for RAID devices.

What happened was that one of the cables had been bumped, disconnecting one of the drives. Thus the array was running in a degraded state for over a month – not good. Anyway I noticed this when logging in one day to fix something else. The device wasn’t visible so I told Dad to check the cable, but unfortunately when he went to secure the cable, he must have somehow disconnected the another one. This caused a second drive to fail so the array immediately stopped.

Despite having no hardware failure, the situation is similar to someone replacing the wrong drive in a raid array. Recovering it was an interesting experience, so here I’ve documented the process.

YOU CAN PERMANENTLY DAMAGE YOUR DATA BY FOLLOWING THIS GUIDE, DO NOT PERFORM THIS OPERATION ON THE ORIGINAL DISKS UNLESS THE DATA IS BACKED UP ELSEWHERE.

Gathering information

The information you’ll need should be contained in the superblocks of the raid devices. First you need to find out which drive failed first, with the mdadm –examine command. My example was a raid5 array of 4 devices, sdb1, sdc1, sdd1 and sde1:

Look at the last part. Here we can see that this drive is in sync with /dev/sdd1 but out of sync with the other two (sdc1 and sde1) – the data indicates that sdc1 and sde1 have failed. These drives are the two in the external enclosure… but I digress.

Performing an examine on sdc1 shows “active sync” for all the other drives, clearly this disk has no idea what’s going on. Also note the update time of February 5 (it is now March!!):

When this drive was last part of the array, sdc1 was faulty but the other two were fine. This indicates that it was the second drive to be disconnected.

Scary stuff

Despite being marked as faulty, we have to assume that the data on /dev/sde1 is crash-consistent with sdb1 and sdd1 as the array immediately stopped upon failure. The original array won’t start because it only has two active devices. But we can create a new array with 3/4 of the drives as members and one missing.

This sounds scary and it should. If you have critical data that you’re trying to recover from this situation I would honestly be buying a whole new set of drives, cloning the data across to them and working from those. Having said that, the likelihood of permanently erasing the data is low if you’re careful and don’t trigger a rebuild with an incorrectly configured array (like I almost did).

Important information to note is the configuration of the array, in particular device order, layout and chunk size. If you’re using defaults (in hindsight probably a good idea to lessen the chance of something going wrong in situations ilke this), you don’t need to specify them. However you’ll note that in my example the chunk size is 512K, which differs from the default of 64K.

Update 2012/01/04

When reading the following notes you should note that the default chunk size in more recent versions of mdadm is 512K. In addition, ensure that you are using the same layout version as the original array by specifying with -e 0.90 or -e 1.2. If you are using the same distribution of mdadm as the array was created with, and didn’t manually specify a different version, you should be safe. However when dealing with raid arrays it always pays to double check. The metadata version information should be in the output of mdadm –examine or in mdadm.conf. Thanks to Neil Walfield for the info!

So despite creating a bad array I was still able to stop it and create a new array with the correct configuration. I don’t believe there is any corruption as no writes occurred, and the array didn’t rebuild.

Adding the first-disconnected drive back in

The array is of course still in a degraded state at this point and no more secure than RAID0. We still need to add the disk that was disconnected first back in to the array. Compared to the rest of the saga this is straightforward:

Post navigation

33 thoughts on “Recovering a RAID5 mdadm array with two failed devices”

I’d broaden a bit and say eSATA is a risky choice for any permanent use – RAID or not.

As someone who used much of your prior ubuntu server post as reference, I decided to go with RAID6 instead. Even though I’m only running 4 drives at the moment and RAID6 causes me to sacrifice 2 of 4, the redundancy of RAID5 is not sufficient for me. Given most RAIDs are built with drives of the same model, similar age, often the same Lot #, and experience nearly identical usage, multiple simultaneous failures are not that farfetched. I believe the odds of two drives dying at the exact same moment are low, but a full rebuild will stress-test the remaining drives at a time that I can least afford to have a second drive go.

I do like eSATA for performing back-ups. I’d be interested in a solution that can back up the entire RAID. Is it reasonable to run a tape drive at home?

For me however the security of the daily backup offsets the risk of multiple drives failing, so while one failing might indicate that an additional failure from the same batch is more likely, at most you lose a day’s worth of data.

IMHO, the only reasons to go with tape are portability and durability of the media. You can get more storage on a hard drive these days for much lower cost, and the speed and flexibility of the backups is incomparable (can’t rsync to a tape…). If you need to keep your backups for a long time and your data set isn’t too large, tapes can make sense, but for someone who just wants to ensure their data is safe a couple of 2TB (or 3TB) hard drives on rotation with the aforementioned RAID array is hard to beat.

This guide literally saved my life. (Ok, not literally.) I had a SATA controller die with two drives attached die on me. When I replaced the controller the RAID would not start because both drives were marked as faulty. Using this guide, I thought of recreating the array with both drivers since the –examine details were the same for both, but somehow I feared there would be inconsistency between the two, so chickened out and created the array with just one of them. My data (family photos) was preserved. I added the other “faulty” drive later without issue. THANKS!!!!

The following happened to me: I have a 4 disk RAID-5. A disk failed, which I promptly replaced. I added the new disk to the array, but it was not integrated because the recovery exposed a bad sector in a second disk. Double disk failure! Ouch. To recover, I had to recover one of the failed disks. To do this, I used gddrescue, a dd-like tool with error recovery capabilities. After copying one of the failed disks to the new disk, I was able to recover the array using this guide.

This is very helpful, thank You for sharing.
ps. I’ve lost whole 1TB volume few years ago due:
– ext4 still unstable
– raid10 volume corruption
– kernel panics
– 3 x power failure while rebuild
fortunatelly, it was just test server..
No drive failed, all were fine. Just filesystem corruption.
Then I’ve learned how to recover data from mdadm RAID arrays using Testdisk.

Mdadm has mailer daemon built-in.
You just have to set up Exim or Sendmail as smarthost (if You have some mail server already and can use it) or sending-only system.
Mdadm sends mail on every event.. even testing!

This post gave me the courage to pull the trigger on my failed array. Thank you, 16TB family RAID5 array rebuilding now!!! Now if I can only remember to keep the bloody server chassis locked with a two year old around…

A 2nd follow up. You can use mdadm –assemble –force rather than –create to put the raid back together. With this method you don’t have to specify the missing drive. I also don’t think you need to specify the drives in order, though I did anyway. It will also mark failed drives as clean because of –force.

Alex, could you add om top of your post that “mdadm –assemble –force” should be tried first?
I used re-creation suggested in this post and damaged my data, because sector offset is different because mdadm versions. This is very dangerous.

Of course this is dangerous, I probably could have stressed this more but it worked for me, and the difference between versions IS mentioned (although that was a later update thanks to feedback). The assemble force method is clearly better however, so I’ve added a note to the top.

It’s been a while since I’ve worked with mdadm, so I’m not the best person to get advice from, but it looks like you’re trying to create a new array with all the devices of the old one present. The article above describes how to create a new array with only the drive that failed first missing, which should result in a readable array if your data hasn’t been corrupted somehow.

It seem that I lost a drive in december 2012 but did not know that and lost a second one last weekend. This one I noticed because the data was partly accessible. I did shut down the server and discontected alle the hdd and connect them again and all 4 were seen by the bios and the md0 worked again because it booted in safe mode. So I have good hope that most of my data is still fine if I make the right rebuild.

One thing I just noticed; your /dev/md1 says chunk size is 512K but your drives say 64K. I can only guess as to how this happened, but make sure you use whatever the array was originally created as. Figuring out the default for your version of mdadm might help with this.

Checked if the data was recovered and i was, than I added the missing drive with

mdadm -a /dev/md1 /dev/sda6

which lead to a rebuild of the array and a booting and working server and all my data is back.

Is it possible that mdadm sent a message when something happens to the array or do I have to move to a hardware aray. A RocketRAID 2720SGL cost around E.180,– so that is not so expansive as it used to be.