A friend of mine (chabotc@xs4all.nl) has been having problems booting his
Dell XPS with the factory default raid setup (which he doesnot want to change
because he doesn't want to remove all his data) ever since the dmraid alignment
checks (bug 186842) were added to the kernel.
Since many people are suffering from the same problem and this gives Linux /
Fedora a bad name, I decided last week to go and try to fix this.
So I've borrowed his PC and now after 8 full hours of debugging I've found the
problem and have a fix for it (actually 2).
Rescent mkinitrd versions generate the correct:
rmparts sdb
rmparts sda
dm create nvidia_dabhfihh 0 976562432 striped 2 128 8:0 0 8:16 0
dm partadd nvidia_dabhfihh
lines (or similar lines for mother dmraid setups), the problem is that the dm
code in nash > 5.0.46 creates the nescesarry device nodes under /dev/mapper, but
doesn't create the matching /dev/dm-x device nodes.
With some searching I managed to find SRPMS for most versions of mkinitrd
between 5.0.46 (which my friend found out workes for him) and 5.1.9 . After
trying most of them it was determined that the problem was introduced between
5.0.46 and 5.0.47 . I've done a diff between these 2 versions and the problem is
the removal of a call to smartmknod() in block_sysfs_try_dir(). Readding this
call fixes this. Notice that an extra call to smartmknod() was added to
block_show_labels(), but that appearantly doesn't work in the dmraid case.
The attached mkinitrd-5.1.9-test.patch readds the removed line, fixing this on
my friends PC.
Another way of fixing this is adding an extra call to mkblkdevs to the init
script in the initrd after the "dm partadd xxxxx"
The attached mkinitrd.diff patch does this.
Please apply one if these 2 patches so that people with a dmraid setup can have
a booting FC-6, a non booting OS is sortoff bad PR.

I am desparate to get past this problem. I confess that I have so far relied
upon RPMs to update so I am hesitant about the patch process. Would someone be
so kind as to provide some directions here (or a pointer to some site that
provides a walkthrough) as to the order and procedure of applying this patch?
Thanks in advance.

No either one will work. In your case the workaround rather then the real fix
has the advantage that it will work without a recompile.
Instructions:
Save the second attachmend as mkinitrd.diff
"cd /sbin"
"patch -p1 < [path-to]/mkinitrd.diff"
Where [path-to] should be replaced by the path to mkinitrd.diff
And then rerun mkinitrd:
"mkinitrd -f /boot/initrd-`uname-r`.img `uname-r`"
Notice that this patch and the entire diagnosis is based on FC6-test2 + rawhide
updates and may or may not apply to FC-5.
With this patch and mkinitrd >= 5.1.6 ("rpm -q mkinitrd" to find out) and no
usb-storage lines in /etc/modprobe.conf dmraid should work.

I am in no man's land, since I am still in FC5 and my mkinitrd == 5.0.32-1. I
suppose I will either have to wait for a solution for that version.
Anyway, thanks for the directions. I hope your efforts help others who are
stuck in this most annoying predicament.

(In reply to comment #9)
> Good news. Just syncd with rawhide and the new kernel booted with dm raid0
> without any modifications. Looks like nash has been fixed.
>
> I'm using the via_sata driver.
>
Hmm, I just checked my mirror and the nash there isn't fixed, maybe this bug
only applies to nv_sata using systems, although I have a hard time believing
that. Could you cat and paste or attach the contents of your /etc/fstab here?
Thanks!

(In reply to comment #11)
> Would someone point me in the right direction to learning how I, too, can sync
> FC5 to RawHide? (Assuming that is possible; I also use via_sata.) Thanks.
First of all this may make your system unbootable even with the older kernel!
Now with that said, edit:
/etc/yum.repos.d/fedora-core.repo
This file has 3 sections, of which only the top one is enabled by default, you
can see this because the top section contains the line:
enabled=1
Change this to:
enabled=0
If you've enabled any other sections yourself disable them too.
Do the the same for:
/etc/yum.repos.d/fedora-updates.repo
and:
/etc/yum.repos.d/fedora-extras.repo
Now edit /etc/yum.repos.d/fedora-development.repo
and enable the top secxtion, that is modify it so that it contains:
enabled=1
Do the same for:
/etc/yum.repos.d/fedora-extras-development.repo
Now your yum points to the development branch of Fedora. I think in this case it
is wise todo a piecemeal update as you're only interested in mkinitrd, so after
making the above changes type:
yum update mkinitrd
Do not use "yum -y update mkinitrd"!
Now once yum has done all the magic it will give a list of packages that it will
updater, this will include mkinitrd glibc(-xxx) and probably device-mapper and
mdraid, this list may be around 10 packages long if its much longer please post
it here and press N to stop yum from doing the actual update.
If you are comfortable with the list press Y to continue, and once yum is done
you've got the new mkinitrd which is al you need.
After this you may revert the changes to /etc/yum.repos.d/*, if you don't revert
this and do a yum update later you will get updated to a full development system!

pjones,
Can we get some progress on this? Maybe I can inspire some confidence in the
validness of the attached patches / diagnosys of the problem by explaining how I
came to these conclusions:
As said a friend of mine has a Dell XPS, which default comes with nvidea sata
dmraid setup. With a kernel update some time ago this broke for him (and many
others). He had managed to manually fix this by adding the nescesarry "dm xxxx"
lines to the init scripts in his initrd, using an initrd generated by mkinitrd
5.0.46 as base.
Using later mkinitrd versions generated initrd's with the magic lines added
manually for the first few newer mkinitrd versions and added by mkinitrd itself
for later versions, his system broke once again.
So I started by collecting mkinitrd versions 5.0.46 - 5.1.9 and managed to find
most and by trial and error found out that this new breakage was introduced by
5.0.47, so 5.0.47 and newer do not work on his system even with the nescesarry
magic "dm xxxxxx" lines in place.
After pinpointing the exact version which broke I wanted to know where exactly
it broke, so I recompiled the Fedora busybox rpm to include the ash applet and I
inserted "busybox ash" lines between all the lines in the initrd init script.
This way I could closely observe the behaviour of nash / the init script during
the initrd stage of the boot.
This way I soon noticed that with 5.0.46 /dev/dm-x nodes showed up in /dev after
the magic "dm xxxxx" lines in the init script, whereas with 5.0.47 these didn't
show up. The missing of this devices in trun caused the "mkrootdev xxxxx" line
from the init script to fail, which in turn caused total boot failure.
I could fix the boot with 5.0.47 (and later) by doing a manual mknod from ash
for either /dev/dm-x or /dev/root .
Then I first try to rerun mkblkdevs after the "dm xxxxx" lines, which worked but
didn't seem pretty (this is what the second attached patch does). So I did a
"diff -ur" between the sources of 5.0.46 and 5.0.47 (huge diff, many internal
changes) and found the removal of the mksmartnods call which is readded in the
first attached patch, which fixes this in a less ugly way.
I hope that explains to how I came to this patches and why one of these patches
is needed. Now PLEASE apply one of these before FC-6 so that people with a
similar setup can have a working system out of the box.

Yes, updating glibc* and then mkinitrd did the trick. Thanks to Mssrs. Degoede
& Garden, and, for that matter all the other cognesceti who contributed to Bug
30241 & Bug 18642 for providing the magic recipes and incantations to work
through this problem.
Software engineering may not be the dismal science, but it sure travels some
grim paths at times.
For the record, this, in brief, is my setup:
AMD 64 4200
RAID0 (2 x 250gb Western Digital ATA)
ASUS A8V
I know that this is not a production solution. But if I wanted that I guess I
would be using RHEL 4WS as I do at the office. Again, thanks to all concerned
for seeing me through the darkness.

I first yum updated glibc* (to v2.4.90) and then yum updated mkinitrd (to
v5.1.9-1), both from the development repositories as you suggested above.
After those packages were installed, I re-enabled the standard depositories (and
disabled the development ones), applied the kernel update (to take my machine to
2.6.17-1.2174_FC5), and rebooted whilst holding my breath. And so, here I am.
By the way, if memory serves, after the glibc update, the dependency list for
mkinitrd was that package only. Also, all suggested updates, save the kernel,
were applied before I hybridized my system. And as a further correction, my
RAID0 consists of SATA (not just ATA :-)) drives. Again, thanks to all for
helping me through this.

Hmm,
So you didn't use / apply any of the patches attached here and still have a
working setup that makes you the second person. Could you attach /include in a
comment your /etc/fstab and the output of the "mount" command? Just the lines
concernign your / (root) filesystem will do. Thanks!

Looks like we are getting somewhere, thanks Jesse Keating
See the transcript from irc / #fedora-devel below:
f13 Horray! dm-raid still bust-o on rawhide (:
f13 pjones: strangely enough, rescue mode is able to mount it just fine.
hansg f13, maybe the patch I submitted here will fix this:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=203241
* _Zoltan_ has quit (Read error: 113 (No route to host))
hansg f13, the second patch (workaround) can be applied directly to
/sbin/mkinitrd and then recreate the initrd
hansg f13, if you can try this, it helps you and you can then convince pjones to
take a look at #203241 you'll be my hero
* f13 looks
pjones I don't have any argument against fixing it. I just don't have any time,
either.
hansg I've done a lot of digging and I'm willing todo more if needed, but some
response showing that my work (around 12 hours sofar) isn't going to /dev/null
would be appreciated
hansg f13, also make sure you are using the latest mkinitrd and that you do not
have any scsi_adapter usb-storage aliases in /etc/modprobe.conf
f13 hansg: so the problem I'm having is IO error reading sda2 or something like
that.
f13 I'll patch, we'll see.
* behdad has quit ("Leaving.")
hansg f13, then its most likely usb-storage aliases in /etc/modprobe.conf
* jwb grows tired of callion and dnielsen spotting on blogs
f13 hansg: I watched the mkinitrd creation, there were no usb modules added to
initrd.
f13 hansg: for rawhide do I need both the mkinitrd patch _and_ the initscripts
patch?
f13 n/m, I read it now
* f13 tries the /sbin/mkinitrd patch
hansg f13, you say no usb modules at all? Or just not usb-storage? If you've got
no usb-modules at all then you're using a pretty old mkinitrd (or a very new on
with wihhc I'm not familiar yet)
hansg f13, rpm -q mkinitrd ?
f13 hansg: oh fun! I updated to newest mkinitrd and now I get the usb modules
brought in.
f13 uhci-hcd, ohci-hcd, ehci-hcd
hansg f13, good! That might fix the sda2 error
f13 hrm,
f13 I should try this w/out your patch first.
hansg those are a normal erm "feature" of the newest mkinitrd, as long as
usb-storage isn't added things are ok
f13 nod
hansg yes testing without the patch first is a good idea I think
f13 its helpful when you don't get udev but you have a usb keyboard
* somegeek has quit (Read error: 104 (Connection reset by peer))
f13 peter and I kept hitting this on my ppc mini.
f13 udev would barf the box, but w/out udev we couldn't use the usb keyboard (:
f13 hansg: rebooting w/out your patch.
hansg yes they are I had the same problem when I added a static shell to the
initrd to debug this on a friends Pc, no keyboard
f13 hansg: so, with the unpatched new mkinitd and a recreated initrd, it just works.
hansg thats good news, lots of people tell me that, but it doesn't work on my
friends PC without the patch :|
f13 suck.
hansg what does "mount" say for root?
hansg and /etc/fstab?
f13 /dev/dm-1 on /boot type ext3 (rw)
f13 /dev/mapper/VolGroup00-LogVol00 on / type ext3 (rw)
f13 /dev/VolGroup00/LogVol00 / ext3 defaults 1 1
f13 LABEL=/boot /boot ext3 defaults 1 2
f13 --- Physical volume ---
f13 PV Name /dev/dm-2
hansg Thanks, thats different from what my friend has he has /dev/dm-X as root
device instead of /dev/mapper/VolGroup00-LogVol00, thats probably why the
missing /dev/dm-x nodes (missing frm the /dev in the initrd) bite him
* tibbs_ has quit (Remote closed the connection)
f13 hansg: probably, is he not using LVM?
* stickster (n=pfrields@fedora/stickster) has joined #fedora-devel
hansg I'm pretty sure that if you change your fstab to contain LABEL=/ for root
and then rerun mkinitrd you will need my patch
hansg because when using a label the label gets translated to ./dev/dm-X and not
/dev/mapper/XXXXXX
f13 hrm.
f13 all that is crackrock. I hear pjones screaming about that through the
cubewall on a weekly basis.
f13 the stupid naming of crud that is
pjones we shouldn't _ever_ be mounting a dm-N device
pjones If we do, that's a bug.
pjones (but ugh, what a PITA)
f13 pjones: my box has it mounted for /boot/ :/
pjones So I see.
f13 /dev/dm-1 on /boot type ext3 (rw)
f13 ah
* somegeek (i=levin@tor/regular/somegeek) has joined #fedora-devel
f13 so the label translation stuff is getting it wrong again?
hansg then my patch is wrong and the real bug is that LABEL= lines can get
translated to /dev/dm-X stuff?
---
Chris (chabotc) can you try to change the line for your root filesystem in
/etc/fstab to use /dev/mapper/XXXXXp3 as device instead of LABEL=/ and then
recreate your initrd with a pristine (unpatched) mkinitrd?

some more irc logs:
hansg f13, pjones, If i understand correctly we've pretty much got the dmraid
problem confined / defined to wrong LABEL=xxx translation, right?
pjones I did say I haven't looked at it, right?
pjones but even if we get dm-1 instead of /dev/mapper/pdc_whatever , as long as
they're the same major:minor that shouldn't cause a failure
hansg pjones, it does because mkinitrd > 5.0.46 (nash > 5.0.46 actually) no
longer creates /dev/dm-x in the ramdisk /dev dir and does the mkrootdev line
from the init ramdisk script fai when it gets passed /dev/dm-x as a parameter
* Foolish has quit (Read error: 104 (Connection reset by peer))
hansg s/does/thus/
pjones yeah, but it shouldn't be getting /dev/dm-N as a parameter.
pjones if it is, there's another problem being missed
hansg pjones, agreed which seems to happen in the LABEL -> device translation
pjones taking patches ;)
hansg the stranege thing is I did try putting /dev/mapper/XXXX in the
initrd-init script manually and that didn't work either, but maybe that was with
an older mkinitrd when I was debugging this I've tried about 10 different
mkinitrd versions
hansg I've asked my friend to try it with /dev/mapper/XXXX as root in his fstab,
if that fixes things for him I'll take a stab at fixing the LABEL -> device creation

It turns out that although /dev/dm-x related the patches attached to this bug
are completely wrong. The real problem (for normal setups) is that booting by
LABEL= from lvm or dmraid fails. Bug 204768 was created for this problem and
contains a proper patch, so I'm closing this one as a dup of 204768.
*** This bug has been marked as a duplicate of 204768 ***

Note

You need to
log in
before you can comment on or make changes to this bug.