Links

Saturday, July 7, 2012

mdadm: device or resource busy

I just spent a few hours tracking an issue with mdadm (Linux utility used to manage software RAID devices) and figured I'd write a quick blog post to share the solution so others don't have to waste time on the same.

As a short background, we use mdadm to create RAID-0 stripped devices for our Sugarcube analytics (OLAP) servers using Amazon EBS volumes.

The issue manifested itself as a random failure during device creation:

I searched and searched the interwebs and tried every trick I found to no avail. We don't have dmraid installed on our Linux images (Ubuntu 12.04 LTS / Alestic cloud image) so there's no possible conflict there. All devices were clean, as they are freshly created EBS volumes and I knew none of them were in use.

And strangely, the creation or assembly would sometime work and sometime not:

$ mdadm --manage /dev/md0 --stop

mdadm: stopped /dev/md0

$ sudo mdadm --assemble --force /dev/md0 /dev/xvdh[1234]

mdadm: /dev/md0 has been started with 4 drives.

$ mdadm --manage /dev/md0 --stop

mdadm: stopped /dev/md0

$ sudo mdadm --assemble --force /dev/md0 /dev/xvdh[1234]

mdadm: cannot open device /dev/xvdh3: Device or resource busy

$ mdadm --manage /dev/md0 --stop

mdadm: stopped /dev/md0

$ sudo mdadm --assemble --force /dev/md0 /dev/xvdh[1234]

mdadm: cannot open device /dev/xvdh1: Device or resource busy

mdadm: /dev/xvdh1 has no superblock - assembly aborted

$ mdadm --manage /dev/md0 --stop

mdadm: stopped /dev/md0

$ sudo mdadm --assemble --force /dev/md0 /dev/xvdh[1234]

mdadm: /dev/md0 has been started with 4 drives.

I started suspecting I was facing some kind of underlying race condition where the devices would get assigned/locked during the device creation process. So I started googling for "mdadm create race" and I finally found a post that tipped me off. While it didn't provide the solution, the post put me on the right track by mentioning udev and it took only a few more minutes to narrow down on the solution: disabling udev events during device creation to avoid contention on device handles.

So now our script goes something like:

$ udevadm control --stop-exec-queue

$ mdadm --create /dev/md0 --run --level=0 --raid-devices=4 ...

$ udevadm control --start-exec-queue

And we now have consistent reliable device creation.

Hopefully this blog post will help other passers-by with a similar problem. Good luck!