Static Pages

Friday, December 25. 2009

Ever since my storage system was built there was one thing that annoyed me. The 2.5” hard disk drive that houses the operating system itself was lifted from an old notebook and had the annoying property of parking it’s heads after five seconds of inactivity. Since ZFS writes to the disk quite often and regularily this led to a constant cycle of parking and unparking. This was certainly not helping the disks life span, it made an annoying noise and it caused small system hangs whenever the disk had to unpark it’s heads to read some data.

Under Linux one could use hdparm to instruct the disk to not park it’s heads, but unfortunately a program mimicking this functionality seems to be absent under Solaris. Thus the plan to replace the disk with a different one which had a more sensible apporoach to head parking.

This turned out to be an interesting endeavour.

The general problem of replacing the disk holding the rpool is common enough that the excellent ZFS troubleshooting guide has a section on doing this. The general plan of action is as follows:

Insert the replacement disk into an available slot

Create a partition spanning the whole disk

Create boot and data slices

Attach the new disk as a mirror to the rpool

Wait for the resilver to finish

Install grub on the new disk

Try to boot from the new disk

Detach the old disk from the rpool

Remove the old disk

This is all very sensible, and it all works as advertised. In my case there is, however, a last step not on the list above:

Put the new disk on the controller the old disk was attached to

The reason for that is that the case I used only has one internal 2.5” hard disk drive slot. The new disk was prepared using an external USB-IDE converter module. This worked just fine, the BIOS is even able to boot from the USB disk. As long as the new disk remained attached to the USB converter everything was fine, even after the old (internal) disk was removed from the rpool. But putting the new disk into the case caused Solaris to roll over and die early in the boot process due to not finding it’s rpool disk. The error message indicated that it was trying to read the pool from the external USB device (which no longer existed at this point).

Investigation (and much swearing) turned up that this information was passed by GRUB to the Solaris kernel.

Solaris uses a patched GRUB version which understands ZFS and has some string replacement magic built in. Every (non failsafe) boot entry contains a line similar to this:

kernel$ /platform/i86pc/kernel/$ISADIR/unix -B $ZFS-BOOTFS

$ZFS-BOOTFS is replaced by GRUB with the following information:

The name of the root pool (usually rpool) and the number of the dataset that contains the root file system (there may be several BEs)

The device path of the disk this GRUB instance was read from

The actual command line that is executed by GRUB thus looks something like this:

The interesting part here is the bootpath parameter. This is the device that Solaris will try to mount the rpool from. Even if the rpool consists of several mirror devices, only one is used in the initial boot process. Where does GRUB get the device path from? It’s read from the rpool header, from the disk GRUB was loaded from. Every ZFS pool disk contains the device path it was last found under. This usually does not matter much, a RAIDZ will still mount if you swap the disks around when the machine is off, but the boot process relies on the rpool disks not wandering around. My new disk still had the USB device path embedded, which GRUB read and passed to the kernel, which then failed to find the disk.

Fixing this turns out to be easy: boot into failsafe mode with the new disk on it’s final connector. This will search for rpools and BEs on the system and offer to mount one of them. Pick the right one, reboot. This is enough to get the current (and correct) device path embedded into the rpool. The next (non failsafe) boot will thus pick up the correct device path and allow the boot to continue.

The morale of an afternoon thus spent in the innards of the Solaris boot process is thus: do not swap your rpool disk around.

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.Enter the string from the spam-prevention image above: