Linux - HardwareThis forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?

Notices

Welcome to LinuxQuestions.org, a friendly and active Linux Community.

You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!

Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.

If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.

Having a problem logging in? Please visit this page to clear all LQ-related cookies.

Introduction to Linux - A Hands on Guide

This guide was created as an overview of the Linux Operating System, geared toward new users as an exploration tour and getting started guide, with exercises at the end of each chapter.
For more advanced trainees it can be a desktop reference, and a collection of the base knowledge needed to proceed with system and network administration. This book contains many real life examples derived from the author's experience as a Linux system and network administrator, trainer and consultant. They hope these examples will help you to get a better understanding of the Linux system and that you feel encouraged to try out things on your own.

I have a new OpenSuSE 10.3 (2.6.22.17-0.1-default) server that I configured a RAID-1 root partition on.

The second hard drive of the pair turned out to be faulty (SMART errors), so I replaced it, and added the new drive to the array. Unfortunately, instead of telling mdadm to add sdb2, I accidentally gave it sdb.

So, I failed the drive, repartitioned it with fdisk, and added sdb2 into the array. The array rebuilt with no errors and I thought all was well... until the next reboot.

The system booted successfully, but my RAID volume started in degraded mode because the second partition (sdb2) did not exist in /dev. I am able to get the partitions into /dev using partprobe, which is confusing. Why would the partitions not already be there?

Another person posted the exact same problem in the newbie forum, but the response didn't make much sense to me, and the OP did not indicate if/how his problem was fixed. I would post the link to that thread, but the forum will not let me until I've posted at least once. I'll provide it in a reply.

Could you please help me? Lots of details are below. If you need any others, please ask.

Creating device nodes with udev
mdadm: failed to add /dev/sdb2 to /dev/md0: No such device or address
mdadm: /dev/md0 has been started with 1 drive (out of 2).

If I use partprobe to refresh the partition table on the live system, I can then add sdb2 to my RAID array and it will resync. This is good until the next reboot, at which point the partition is missing again, and I have to resync all over again.

Okay, I'm in much the same boat as you.
The difference is I'm doing raid 6 on 8 drives.
I built the system with 6, everything seemed good, I added two more and I expanded on to those. All seemed good. I don't *think* I stuffed up the last two and used the disks rather than the partitions.

Ah, to recap. Failure of seeing two of the partitions in /dev
fdisking the drives shows that the partitions are there though, and looking all linux raidlike and just like the other visible six.
/proc/partitions does not list the two partitions.

Here's some interesting new information.
I was able to work around the failure by messing around with the mdadm.conf.
If I listed the devices as just being the partitions, the above failure was occurring - ie, two partitions go AWOL, mdadm can't start.
If I listed the devices as /dev/sda1,/dev/sdb1, etc, then mdadm would take an awfully long time to start, as would Ubuntu (7.10). When I'd log in, mdadm was not starting the array, and it was also consuming memory. Left unattended it would shortly bring the system to its knees by consuming everything. However, I did have all 8 partitions listed (not including the 9th drive with the system partition/swap/whatever). So if I killed the process, and I then changed the config back to use partitions for devices, then mdadm could happily start. Of course, if I rebooted, then I'm back to the starting scenario where it would fail to start because the two partitions were AWOL. So my workaround was to start in configuration 2, then kill mdadm, change config to configuration 1, restart mdadm, change config to configuration 2 for next reboot.

So, that was my workaround, but I wasn't overly happy about it.
So recently (about now), after several months of this, I decided to get more hardcore about finding out what was going on. Looking in the syslog, I had a lot of messages about array md1 already having disks. A hell of a lot. Eventually it would stop and it would start doing some sort of mdadm unbind.
I'm a little unsure as to whether these last two partitions are somehow tagged a little differently than the others, or whether mdadm was getting enthused about starting the array a little too early.
Currently I think I've done *something* (sigh) because it's not quite acting the same and I'm actually having a little trouble starting the array at all.
Just a moment ago I had it starting up degraded (minus the two disks - it is raid 6) but now I might have wandered a little, for it's not even doing that, but complaining about a missing md1 superblock. Oh, and all my devices seem to have shuffled along a bit (mdadm -E /dev/sda1 tells me that it's sdb1, with /dev/sdf1 being blank).

I never thought about partprobing. Thanks.
So by partprobing and getting the sdg1 and sdh1 back into /dev and partitions, I am able to start mdadm fine with 8 disks, no apparent issues...
But of course the issue is still not resolved, I've just got a workaround again for missing partitions on boot!

Long story short, with the sort of spontaneous 'inspiration' that I've had, combined with lack of knowledge, interesting snippets on random forums, and bits of "it's telling me I'm wrong but I'm pretty sure I'm right", I'm actually a little surprised I haven't lost the entire array yet!
It's probably time to stop tempting fate and get a little more methodical.

Incidentally, one time I logged in fast and dropped to a terminal and saw my two partitions there when I wasn't expecting them there ... and shortly after they were removed. Hmmmmm.

I've the same issue with disappearing /dev/sdb1,2 with combination of Raid1 md0. It started with accidentaly adding /dev/sdb to /dev/md0 instead of /dev/sdb1 (mdadm --manage /dev/md0 --add /dev/sdb instead of --add /dev/sdb1).