Upgrading a server to 64bit Xen

I have access to a server in Germany that was running Debian/Etch i386 but needed to be running Xen with the AMD64 version of Debian/Lenny (well it didn’t really need to be Lenny but we might as well get two upgrades done at the same time). Most people would probably do a complete reinstall, but I knew that I could do the upgrade while the machine is in a server room without any manual intervention. I didn’t achieve all my goals (I wanted to do it without having to boot the recovery system – we ended up having to boot it twice) but no dealings with the ISP staff were required.

The first thing to do is to get a 64bit kernel running. Based on past bad experiences I’m not going to use the Debian Xen kernel on a 64bit system (in all my tests it has had kernel panics in the Dom0 when doing any serious disk IO). So I chose the CentOS 5 kernel.

To get the kernel running I copied the kernel files (/boot/vmlinuz-2.6.18-92.1.13.el5xen /boot/System.map-2.6.18-92.1.13.el5xen /boot/config-2.6.18-92.1.13.el5xen) and the modules (/lib/modules/2.6.18-92.1.13.el5xen) from a CentOS machine. I just copied a .tgz archive as I didn’t want to bother installing alien or doing anything else that took time. Then I ran the Debian mkinitramfs program to create the initrd (the 32bit tools for creating an initrd work well with a 64bit kernel). Then I created the GRUB configuration entry (just copied the one from the CentOS box and changed the root= kernel parameter and the root GRUB parameter), crossed my fingers and rebooted. I tested this on a machine in my own computer room to make sure it worked before deploying it in Germany, but there was still some risk.

After rebooting it the command arch reported x86_64 – so it had a 64bit Xen kernel running correctly.

The next thing was to create a 64bit Lenny image. I got the Lenny Beta 2 image and used debootstrap to create the image (I consulted my blog post about creating Xen images for the syntax [1] – one of the benefits of blogging about how you solve technical problems). Then I used scp to copy a .tgz file of that to the server in Germany. Unfortunately the people who had set up that server had used all the disk space in two partitions, one for root and one for swap. While I can use regular files for Xen images (with performance that will probably suck a bit – Ext3 is not a great filesystem for big files) I can’t use them for a new root filesystem. So I formatted the swap space as ext3.

Then to get it working I merely had to update the /etc/fstab, /etc/network/interfaces, and /etc/resolv.conf files to make it basically functional. Of course ssh access is necessary to do anything with the server once it boots, so I chrooted into the environment and ran “apt-get update ; apt-get install openssh-server udev ; apt-get dist-upgrade“.

I stuffed this up and didn’t allow myself ssh access the first time, so the thing to do is to start sshd in the chroot environment and make sure that you can really login. Without having udev running a ssh login will probably result in the message “stdin: is not a tty“, that is not a problem. Getting that to work by the commands ‘ssh root@server “mkdir /dev/pts”‘ and ‘ssh root@server “mount -t devpts devpts /dev/pts”‘ is not a challenge. But installing udev first is a better idea.

Then after that I added a new grub entry as the default which used the CentOS kernel and /dev/sda1 (the device formerly used for swap space) as root. I initially used the CentOS Xen kernel (all Red Hat based distributions bundle the Xen kernel with the Linux kernel – which makes some sense). But the Debian Xen utilities didn’t like that so I changed to the Debian Xen kernel.

Once I had this basically working I copied the 64bit installation to the original device and put the 32bit files in a subdirectory named “old” (so configuration can be copied). When I changed the configuration and rebooted it worked until I installed SE Linux. It seems that the Debian init scripts will in many situations quietly work when the root device is incorectly specified in /etc/fstab. This however requires creating a device node somewhere else for fsck and the SE Linux policy version 2:0.0.20080702-12 was not permitting this. I have since uploaded policy 2:0.0.20080702-13 to fix this bug and requested that the release team allow it in Lenny – I think that a bug which can make a server fail to boot is worthy of inclusion!

It seems that the Debian Xen kernel has those modules linked in and the Debian Xen utilities expect that.

Currently I’m using Debian kernels 2.6.18 and 2.6.26 for the DomUs. I have considered using the CentOS kernel but they decided that /dev/console is not good enough for the console of a DomU and decided to use something else. Gratuitous differences are annoying (every other machine both real and virtual has /dev/console). If I find problems with the Debian kernels in DomUs I will change to the CentOS kernel. Incidentally one problem I have had with a CentOS kernel for a DomU (when running on a CentOS Dom0) was that the CentOS initrd seems to have some strange expectations of the root filesystem, when they are not met things go wrong – a common symptom is that the nash process will go in a loop and use 100% CPU time.

One of the problems I had was converting the configuration for the primary network device from eth0 to xenbr0. In my first attempt I had not installed the bridge-utils package and the machine booted up without network access. In future I will setup xenbr1 (a device for private networking that is not connected to an Ethernet device) first and test it, if it works then there’s a good chance that the xenbr0 device (which is connected to the main Ethernet port of the machine) will work.

After getting the machine going I found a number of things that needed to be fixed with the Xen SE Linux policy. Hopefully the release team will let me get another version of the policy into Lenny (the current one doesn’t work).

2 comments to Upgrading a server to 64bit Xen

“Based on past bad experiences I’m not going to use the Debian Xen kernel on a 64bit system (in all my tests it has had kernel panics in the Dom0 when doing any serious disk IO). So I chose the CentOS 5 kernel.”