One of the questions I always ask myself when setting up a resilient server, is just how well will it cope with a disk failure? Ultimately you cannot answer that without trying it out.

But as practice (and to determine whether it mostly works), it’s perfectly sensible to try it out on a virtual machine.

Debian Installation

If you are looking for full instructions on installing Debian, this is not the place to look. I configured the virtual machine with 2GBytes of memory, an LsiLogic SAS controller with two attached disks each of 64GBytes.

The installation process was much as per normal (I unselected “Desktop” to save time), but the storage was somewhat different :-

In the partitioning manager select each Logical Volume in turn and specify the file system parameters.

You will notice that no swap was created – this was a mistake that I’m in the unfortunate habit of making! However for a test, it wasn’t a problem and with LVM it is possible to create swap after the installation.

Post Installation

After the server has booted, it is possible to check the second hard disk for the presence of grub in the MBR (dd if=/dev/sdb of=/var/tmp/sdb.boot bs=1M count=1, and then run strings on the result). It turns out that nothing is installed in the MBR of the second disk by default. Which would make booting in a degraded environment an interesting challenge (i.e. you’ll have to find a rescue CD and boot off the relevant hard disk).

However this can be fixed by installing grub onto the second hard disk: grub-install /dev/sdb

Testing Resilience

But what happens when you lose a disk? Now is the time to test. Shut down the virtual machine and remove the second hard disk – leaving the first hard disk in place does not provide a full test.

If your first attempt at booting afterwards results in a failure to acquire a grub menu, then either you have failed to run grub-install as detailed above (guess what mistake I made?), or your BIOS settings don’t permit the computer to boot off anything other than the first hard disk.

However, in my second attempt, the server booted normally with the addition of a few messages that indicate that there is just one disk making up the mirrored pair.

Summary

Yes, you can put /boot onto an LVM file system that sits on mirrored disks. That hasn’t always been the case.

It is still necessary to run grub-install to put Grub onto the MBR of the second hard disk.

Every so often I come across an old Linux box that doesn’t take kindly to being rebooted. Without console access, it is hard to see what is going on, but the Linux kernel gets stuck trying to mount the root file system. There are many possible fixes for this, but they all have one thing in common … a work-around has to be performed to get the box up and running.

The console gets stuck in a “mini-root” environment loaded when the initrd image is loaded and before the real root file system is mounted which means a lot of commands are not available, but lvm should be available. First of all, run lvm lvscan to get a list of the logical volumes that need activating :-

If you have previously used Linux’s volume manager (LVM) to set up disk storage, you may want to know about how to grow a filesystem safely.

Which is probably the big feature of any decent volume manager because accurately predicting the size of filesystems is a black art, and the only alternative – to make the root filesystem contain all of the storage is a dumb idea.

It’s actually really easy and can be done non-disruptively. It is done in two parts – effectively growing the “disk” device and then growing the filesystem itself.

Extending The Volume

First identify the volume you need to extend. You can of course simply run lvscan which will show a list of the volumes, and if you have named them sensibly will allow you to pick out the volume to extend. But the simplest way is simply to run df to look at the filesystem you want to extend :-

/dev/mapper/ssd-opt 7.9G 5.5G 2.1G 73% /opt

The device (in the first column) is what we extend. Now to decide how much to grow the volume by; just for the case of this example, we’ll assume that 2Gbytes is a sensible amount to grow the volume by. The command needed is :-

lvextend --size +2G /dev/mapper/ssd-opt

And that’s it. No need to shut down the server, dismount the filesystem, etc. Of course we haven’t quite finished yet.

Growing The Filesystem

What we have done at this point is the equivalent of making the disk bigger. We also need to tell the filesystem it is sitting on a bigger disk, and to do so we need to know the type of the filesystem. The canonical place for checking that is the file /etc/fstab (actually it’s the filesystem itself but that is going too far) :-

# grep opt /etc/fstab
/dev/mapper/ssd-opt /opt ext4 noatime 0 3

It is probable that you are looking at growing an ext3, ext4, or xfs filesystem. If not you will have to look up the details yourself.

Growing ext3, or ext4 Filesystems

This is done with the resize2fs command :-

resize2fs /dev/mapper/ssd-opt

Several points :-

Yes it can be done “online” whilst the filesystem is mounted (and applications are busy using it).

You need to specify the device containing the filesystem to grow and not the mount point.

There is no need to specify the size … the size will be determined from the size of the device underneath the filesystem.

Growing xfs Filesystems

This is done with the xfs_growfs command :-

# xfs_growfs /opt

Several points :-

Yes, it can be done “online”. In fact you have to do it with the filesystem mounted.

You need to specify the mountpoint of the filesystem and not the device. Irritatingly different from the above!

There is no need to specify the size.

How Reliable Is This?

Very.

There is always the chance that something could go wrong especially if you are operating “at the edge” (say you have a filesystem that is unusually large – several petabytes). But I’ve done online filesystem resizing for years in countless circumstances without an issue.

I’ll quite happily do it on the most important systems during working hours without losing a moment’s thought. However I do work in a place that takes backups seriously!

For some time now, I have been contemplating switching Linux distributions on my main workstation from Ubuntu to something a little less … user friendly ? Or perhaps that should be a little more Unix geek friendly. The distribution I chose was ArchLinux for a variety of reasons. If you come across this blog entry looking for a solution to a problem, it may be worth reading through in case the solution appears later on – this is long, and searches may “hit” on something later on.

First of all, let me point out there is really nothing wrong with Ubuntu for most users. It is a pretty useful distribution that is pretty good for the kind of users who have never compiled their own kernel. Nothing wrong with that, but it seems that Ubuntu is gradually becoming a little trickier to use for those of us who prefer to customise their desktop environment with something like Enlightenment – it seems that Ubuntu is really intended for those who want the Ubuntu way.

Nothing wrong with that, and I’m intending to keep running Ubuntu on my netbook. However I wanted a little more control for my main workstation. And what with an SSD to install as a new boot device, it seemed like a good time to try out ArchLinux especially as I could reboot into Ubuntu if things looked bad. As it happens I haven’t needed to do that! This blog entry is going to get quite long as a place to record my notes on getting ArchLinux to do the things I want, and will grow over time.

The Install

I downloaded the core install image rather than the net install image – not for any good reason as I have done test installs from the net install image and it works well. After installing the SSD into my workstation (stuck to the bottom of the case with duct tape – I should really get a 2.5->3.5″ disk tray), I changed the boot order of the disks in my BIOS to boot from the SSD first. This was perhaps not the best idea as it made things a little trickier later, but it’s workable if you are prepared to juggle disk names (both Linux ones and BIOS/Grub ones).

First for the boring bit :-

Booted off the install CD

Selected CD as source

Set Europe/London as timezone

Set hwclock as UTC

Prep hard drives-

Manually configure hard drives

Partition /dev/sdc (the SSD – identified by the fact it was empty)

Created 256Mb partition /dev/sdc1 (for /boot)

Created partition with the rest of the space /dev/sdc2 as LVM

Manually configure block devices

By device name

Created /boot on /dev/sdc1 as ext2

/dev/sdc2 becomes Volume Group

/ as XFS (16G)

/var as ResierFSS (4G)

swap (4G) – Although I have a tendency to forget this one!

/opt as XFS (4G)

/tmp as ReiserFS (4G) – perhaps a bit too big.

Select Packages

Select Base + Development.

Pick random additions that look like they might be useful (note that it may be necessary to pick all of the various mkinitcpio variations as I did that on the later attempts).

Install Packages

Configure System

Select ‘vi’ as editor

Made the following changes to rc.conf

UseLVM=yes

HOSTNAME=scrofula

eth0=”eth0 10.0.0.18 netmask 255.255.0.0 broadcast 10.0.255.255′

gateway=”default gw 10.0.0.254″

ROUTES=(gateway)

Made the following changes to mkinitcpio.conf

BINARIES=”/sbin/lvm”. This shouldn’t be necessary, but at one point I ended up with a miniroot shell which was unable to mount the root filesystem and with no LVM present, I couldn’t see what was wrong! This error could be related to the raid problems detailed below, but adding this won’t cause any harm.

HOOKS=”base udev autodetect scsi sata lvm2 filesystems”. Note that “raid” is suggested as necessary for software RAID; that turns out to be incorrect as discovered later. Although I needed software RAID to mount my /home, I left that for later after putting raid in here gave errors)

Made the following changes to resolv.conf

search inside.zonky.org

nameserver 10.0.0.12

Made the following changes to mirrorlist

Select something from “Great Britain”.

Set root password.

Done

Install Bootloader

Grub

Installed to /dev/sdc! This is because although the SSD is the third by address, it is also the first boot device in the BIOS.

This didn’t work the first time around. Firstly grub wasn’t setup properly as it wanted to boot the next stage from (hd2,0) which would be one of the hard disks rather than the SSD, as at this point the BIOS is still in charge (more or less). This was easily fixed on a temporary basis by editing the boot setting at the menu, and later on a more permanent basis by editing /boot/grub/menu.lst.

Secondly the first couple of times around, I found myself in what I term the “miniroot shell” which is the shell you get when the Linux install fails to mount the root filesystem. The only hint I had here was that a) it couldn’t mount the root filesystem, and b) the binary /bin/lvm was not present. On the third or fourth attempt (my notes aren’t sufficiently accurate) I managed to get past this stage by excluding the raid “hook” and including the /bin/lvm binary in the mkinitcpio configuration file.

It would seem that at some point ArchLinux has changed the “hook” name from raid to dmraid and some instructions out there still refer to the hook as “raid”. My fault for not checking closely enough with enough sources! But there’s no harm in the ArchLinux people configuring both names – probably just a case of setting up a hard link somewhere!

Post-Installation

With a distribution such as ArchLinux, the easy part is the installation; things get a bit trickier with the post-installation configuration. This is simply because to allow you to do things your way, it needs to leave things alone and let you do your stuff. In other words this lack of default configuration is a feature and not a bug!

The first thing to so after a core install (and probably a net install too) is to perform a full update :-

pacman -Suy

The “pacman” tool is of course the ArchLinux package management tool. This operation sits somewhere between a normal Ubuntu package upgrade and a full Ubuntu distribution upgrade. ArchLinux does not have distribution versions in the same way as Ubuntu – whilst the installation media is undoubtedly upgraded from time to time, once actually installed the command above will perform both upgrades to apply necessary fixes, and upgrade packages when new versions come out.

This can lead to some surprises from time to time of course, but there is also never quite the same level of shock that comes with a distribution upgrade.

In any case, I needed to run the command twice as pacman itself needed an upgrade.

After doing that, I set CONSOLEFONT in /etc/rc.conf to “sun12x22.psfu” to improve the appearance of the console, although there are another couple of fonts based on that font that may well be a better choice. Later I used the “consolefont” hook to set the console font at an earlier stage during the boot process – which is neater; however you should specify the font without the file extension – “sun12x22”, and of course add “consolefont” to the HOOKS variable in /etc/mkinitcpio.conf.

I also edited /boot/grub/menu.lst to change the line that specifies what kernel to load and it’s options :-

kernel /vmlinuz26 root=/dev/mapper/ssd-root ro vga=775

Specifically adding the “vga=775″ to the end of that. This makes the appearance of the console not quite so overwhelming on a 30” monitor!

Also added “dmraid” to the HOOKS variable in /etc/mkinitcpio.conf although reading more documentation hints that the right hook is actually “mdadm”. Run mkinitcpio -p kernel26 to update things.

Rebooted to verify that things are still working. Plus check that the CONSOLEFONT was ok, and that the old volume group:sys was visible.

If you think that you can use a ZFS volume as a Solaris LVM metadevice database, you will be wrong. Whilst it works initially, the LVM subsystem is initialised before ZFS this cannot find the databases. Whilst this may seem to be a perverse configuration at least one administrator has tried it – being me!

This entry is about upgrading a machine running Ubuntu 7.10 to Ubuntu 8.04 which is only just out. But not in the standard way which would be quite boring.

I have at least two computers running Ubuntu, both configured in a fairly complex way and both fairly important (in the sense I really don’t need to try an upgrade and end up with a broken system). Whilst Ubuntu frequently does upgrade without a hitch, it can occasionally choke; this is seemingly more common with more complex installations.

Why not preserve an old copy of the install around to revert to ? Well with LVM it is perfectly possible. Ignoring what happens underneath, I have an LVM volume group called “internal” (actually I don’t, but I would if I were to re-install) which has :-

var – 4Gbytes to be mounted as /var

root – 8Gbytes to be mounted as /

home – “enough” to be mounted as /home

Note I do not believe in allocating all available disk space with a storage management system like LVM available; I do a great deal of storage management work and the biggest mistake anyone can make is assuming that they know the storage requirements of a system throughout it’s whole lifetime. This applies in spades to a desktop machine. Without some free space, the suggested upgrade mechanism won’t work.

Now with modern hard disks, we are likely to have more than enough storage to allocate. For instance on this machine right away I have 138Gbytes of free storage (mirrored). And that it is on a two year old machine; a newer machine would have larger disks. Easily enough storage to have two or more “copies” of different versions of Ubuntu around.

It would be nice if Ubuntu could do much of the work for us, but for now it’s pretty much a manual process. As an aside, the Ubuntu developers should probably think about using LVM in the default installer to assist in the development of this kind of feature.

The first stage is to create new logical volumes and build filesystems on them. I chose to name the logical volumes after the operating system version they would be running …

Now the key here is not to look at the current size of your /var filesystem and decide you need a much smaller filesystem … or the upgrade process will refuse to start. You can always reduce it later if you really want to quibble over 1-2Gbytes.

The next stage is to copy the relevant filesystems across. At this point you should avoid running as much as possible and probably do this from a text terminal after shutting down GDM …

This stage will take some time to complete. You will want to do a quick check of the new / and /var to ensure they look roughly like the originals (I always seem to come up with the equivalent of /var/var when I do something like this). Notice that the new root filesystem is still mounted … you need to edit /mnt/etc/fstab to alter what devices are mounted for / and /var.

The next stage is a bit tricky because I didn’t do it “right”, so I will be suggesting something that I didn’t try myself. The task is to modify /boot/grub/menu.lst in such a way as to result in two separate menu entries that will boot either the old operating environment or the new operating environment.

I would suggest that you :-

Create an entry outside of the “DEBIAN AUTOMAGIC KERNELS LIST” that essentially replicates one of the entries. It should not be modified to boot off the new root filesystem.

Modify all of the entries in the “DEBIAN AUTOMAGIC KERNELS LIST” (it makes sense when you review the menu.lst file) to alter the “root=’ kernel parameter to point to the new root filesystem. This is not the “root (hd0,0)” part, but the kernel parameter “root”. It will specify the old root filesystem logical volume (something like “root=/dev/internal/root”) and you want to change this to “root=/dev/internal/804root”.

At this point you should probably reboot to check that both environments work. Just make sure you have a recent rescue CD knocking around before you do.

After you have done the checking you can boot the new environment and use ‘update-manager’ to upgrade the new environment to Ubuntu 8.04. This will probably work (it worked fine for me).

Undoubtedly the next time I try this, I will figure out how to make it work better, but it is good enough to have a “fallback” option in case an upgrade goes badly. For instance until last week, running Vmware Server under 8.04beta was pretty tricky and if it were still the case I would have to revert back to 7.10.

If you’re hoping to read about Linux finally getting ZFS (except as a FUSE module) then you are going to be disappointed … this is merely a rant about the foolishness shown by the open-source world. It seems that the reason we won’t see ZFS in the Linux kernel is not because of technical issues but because of licensing issues … the two open-source licenses (GPL and CDDL) are allegedly incompatible!

Now some may wonder why ZFS is so great given that most of the features are available in other storage/filesystem solutions. Well as an old Unix systems administrator, I have seen many different storage and filesystem solutions over time … Veritas, Solaris Volume Manager, the AIX logical volume manager, Linux software RAID, Linux LVM, …, and none come as close to perfection as ZFS. In particular ZFS is insanely simple to manage, and those who have never managed a server with hundreds of disks may not appreciate just how desireable this simplicity is.

Lets take a relatively common example from Linux; we have two disks and no RAID controller so it makes sense to use Linux software RAID to create a virtual disk that is a mirror of the two physical disks. Not a difficult task. Now we want to split that disk up into seperate virtual disks to put filesystems on; we don’t know how large the different filesystems will become so we need to have some facility to grow and shrink those virtual disks. So we use LVM and make that software RAID virtual disk into an LVM “physical volume”, add the “physical volume” to a volume group, and finally create “logical volumes” for each filesystem we want. Then of course we need to put a filesystem on each “logical volume”. None of these steps are particularly difficult, but there are 5 seperate steps, and the separate software components are isolated from each other … which imposes some limitations.

Now imagine doing the same thing with ZFS … we create a storage pool consisting of two mirrored physical disks with a single command. This storage pool is automatically mounted as a filesystem ready for immediate use. If we need separate filesystems, we can create each with a single command. Now we come to the advantages … filesystem ‘snapshots’ are almost instantaneous and do not consume additional disk space until changes are made to the original filesystem at which point the increase in size is directly proportional to the changes made. Each ZFS filesystem shares the storage pool with the size being totally dynamic (by default) so that you do not have a set size reserved for each filesystem … essentially the free space on every single filesystem is available to all filesystems.

So what is the reason for not having ZFS under Linux ? It is open-source so it is technically possible to add to the Linux kernel. It has already been added to the FreeBSD kernel (in “-CURRENT”) and will shortly be added to the released version of OSX. Allegedly because the license is incompatible. The ZFS code from Sun is licensed under the CDDL license and the Linux kernel is licensed under the GPL license. I’m not sure how they are incompatible because frankly I have better things to do with my time than read license small-print and try to determine the effects.

But Linux (reluctantly admittedly) allows binary kernel modules to be loaded into the kernel and the license on those certainly isn’t the GPL! So why is not possible to allow GPLed code and CDDLed code to co-exist peacefully ? After all it seems that if ZFS were compiled as a kernel module and released as a binary blob, it could then be used … which is insane!

The suspicion I have is that there is a certain amount of “not invented here” going on.

Controls

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.