Geeks talking about cool stuff

Feature : Creating your own Linux Live CD from scratch

For the feature this month, GeekDeck’s come over all technical. Well it had to happen sooner or later. It’s not that we’ve shunned the technical articles at all, I think it’s probably that they take a lot longer to prepare and write than the articles about more abstract things. Putting my mindless prattle aside, let’s move on to discussing the real crux of the article.
If you’ve ever used Linux for an appreciable amount of time, you may have wondered just how hard it would be to create your own Linux distro. How great would that be? To have complete control over the system and distribute it out over the Internet and become the next Fedora or Ubuntu. Well, this article doesn’t claim to have all the answers but what we will be showing you, is how to create your very own Live CD, based on the Ubuntu repositories. Fans of remastering Ubuntu CDs will probably at this point be saying, “Oh man, we know how to do that, you just use UCK (Ubuntu Customisation Kit), which will take an existing iso, and customise it to your liking. But wait, there’s more.

Whilst customising an existing iso is a useful thing to be able to do, you may just want to change the background, or install an extra package, there comes a time in your career where you just don’t want a full blown Ubuntu Live CD as a starting point. Sometimes, you just want to start from scratch to create nice small, slim image and there are so many applications that this can be of use to. I can hear another crowd now saying “Oh, ok we understand, but Ubuntu probably isn’t the best distro to base it on, if you want a small, tiny, system.” I can understand what these people are saying and the reason for choosing Ubuntu was largely based on two factors. 1) Many people are familiar with using it and 2) There is a wealth of information on how to customise it out there on t’internet.

So why is a live CD so useful? Live CDs have the fantastic property that people can’t break them. Well of course they could always physically snap the media, but one hopes that the people we are distributing this to are not that way inclined. The overall mechanism of a Live CD is that it is based on a Read-Only medium. In this regard you simply can not render the system unbootable, save making changes to hardware or hardware failures. The image we create in this article can also be modified to be netbooted and also booted from a USB pendrive, which makes it exceedingly versatile.

The Linux Kernel
The kernel is the root of everything. It is the system that allows the software to communicate with the hardware. The kernel contains all of the drivers which hold the specific information about how to communicate with different pieces of hardware. The Linux kernel is a monolithic kernel. Almost all kernel drivers and extensions run with full access to the hardware, other applications, the bits that actually do most of the work, run in the user space, and can only talk to the hardware by using an abstraction layer, interfacing with the kernel.

The Linux Kernel is written mostly in C, and has been ported to a number of different architectures, both large, small and even embedded. Any discussion about the Linux kernel would be incomplete without mentioning the father of the linux kernel, Linus Torvalds, who created the kernel in 1991 as a hoddy. 18 years later, and worth over an estimated $1.14 billion, the Linux kernel is one of the most important pieces of software ever developed.

Imagine you want to give a demonstration of some software away to a client. You want to ensure that the software will always work, and the only way to do that is to have complete control over the environment in which it is installed. Clients are not likely to want you to take control of their PCs, either remotely or on site, so what better way to demonstrate your product, than by making a live CD. I am of course assuming that the product you create is a Linux based piece of software. Imagine creating a firewall or an application service where all of the data can be stored on the CD. The list of posibilities are endless. You may of course, just want to be cool and show off to all your friends the latest version of JimmyOS. Whatever the reason, there are simply too many useful applications for creating a Live CD.

As mentioned earlier, the CD we are going to build is based on Ubuntu. That means it will use the Ubuntu repositories, it will look like and feel like Ubuntu. However, the way we are going to build it means that we are not going to fill it with unnecessary bloat. We want this system to be as clean and smooth as possible. In fact, to this end, we are going to build a live CD that will run apache with mysql, php and phpmyadmin. This could be used for a multitude of applications, in fact it could be the base for many web based Live CDs. Our rationale for this Live CD is that it could be used in a classroom, to teach people about how to build web apps. When the machines get shutdown at night, everything is wiped and the system is clean again on next boot.

Now it’s time to start thinking about the CD itself. When I dreamed up this feature, it was after having the thought that I was pretty sure I knew everything I needed for a live CD. Let’s walk through my thought process, scary I know but please bear with me. We need a filesystem, in other words, we need the filesystem, including files and folders, that will be used during the running of the live system. This will include all of the applications that we want to run and any supporting programs and software. We need a kernel to interact with the hardware, and an initrd (initial ramdrive) to provide enough of a base to actuall boot our filesystem. We also need some way of making this read-only environment writeable on a temporary basis, and to handle things like hardware detection etc.

To my surprise I knew about everyone of those items. I just needed a few questions answered, which thanks to Jacob and Andrew were answered very quickly in comments on my blog. In fact my post makes me sound incredibly like an amateur because I was so sure it wasn’t that simple, that I was missing something big. By the end of this article I hope that you will not only understand why a live CD can be so useful, but also how to build it, and what jobs the various components do. To achieve maximum compatibility I urge you to use a flavor of Ubuntu newer than or as new as Hardy. The CD we are going to build is going to be based on Hardy, simply because that was the last long term support release.

Let’s begin by creating our base filesystem that we can chroot into. What’s a chroot I hear you ask? Just look at the pop-out titled chroot and all will be revealed. So we begin by creating a project folder, I’m going to call it “live”. Against my better judgement, I’m making this a cut and paste tutorial, so if you have the pleasure of reading this article online, or in an electronic format, you should be able to just copy and paste the code sections, and they should JustWork(tm). So to start we simply create a folder called live, and inside that we are going to create another folder called chroot.

mkdir work
cd work
mkdir chroot

Easy so far huh? Next we go about creating the root filesystem. Now, this could be done manually and is indeed one of the only real automated steps in this whole process. We are going to use the debootstrap utility. If you don’t have it installed, I suggest you go any grab it. Unfortunately if you don’t know how to grab and install debootstrap, you’re going to find the rest of this article pretty tough going. Even if you have debootstrap, you may be wondering what it actually does. Lucky for you, we’ve created a nice little popout on debootstrap too. In essence debootstrap sets up a very slim filesystem with pretty much everything you need for a process to run, do things, and some utilities and libraries which have been deemed essential. So we run the following, which will obtain the files required for this filesystem.

sudo debootstrap --arch i386 hardy chroot

Notice we’ve chosen i386 as the cpu architecture and Hardy as our intended distribution for installing as mentioned earlier. Once this is finished, the resulting filesystem is basically all you need to run as a standalone linux OS. So why do we need to do more I hear you ask, let’s just bung it on a CD get it running. Well, there’s a couple more things we need to do before we get that far. Truth be told, the main bulk of the work is in packaging the stuff up for the CD, but we are missing one important feature. The Kernel. We may have a root filesystem, but that filesystem is completely useless on it’s own. We need to have something to interact with the hardware of the system. When running this filesystem as a chroot, we are able to use the running kernel to perform this process, but once our filesystem is on it’s own on a CD, we need a kernel to do disk accessing, networking and outputing to the display device.

Luckily, adding the kernel is a relatively easy process, we can install a stock kernel from the Ubuntu repositories. However, running something like apt-get install will try to install to our current environment. Enter our hand chroot utility. This will enable us to pretend that we are actually inside our new filesystem, so any new programs that are installed, will be installed in here, instead of being installed on our real PC. To do this, we run the following.

Initrd
The initial ramdisk, or initrd, is really a temporary staging area to give the the system enough information in order to boot. The kernel that is shipped with many distributions is rather generic in nature. The modules that are required for booting and communicating with hardware are generally loaded dynamically. To compile all of these into a single kernel image would make it very cumbersome, large and unmanageable.

The initial ramdisk holds enough information to chain boot the rest of the system, primarily, to mount the root file system. Generally the initrd system is a compressed cpio archive of the gzip format. This format allows the initial ramdisk to be unpacked by the kernel into a special instance of tmpfs. Doing the process this way means that an intermediate file system is not required in the kernel, as is required with the non cpio format.

The first line is used to ensure that the chrooted environment knows the specifics of the current network we reside in. This is important for installing applications as we generally need to grab them from the web. We will remove this file later otherwise the live CD would only be of use inside our own network environment. The next line is the one that actually enters the chrooted environment. From here on in, everything that we do to this filesystem will not affect our real PC, only our chrooted environment. Ok, great. We are now in our chroot environment, but we can’t actually do anything apart from move files around, or at least not easily. What we do next is mount some specialised filesystems. /proc allows some process to communicate properly with the kernel, /sys allows kernel structures to be exposed to the userspace and the /dev/pts line creates a pseudoterminal allowing a way to fake a real serial terminal for certain applications.

So now we’re in our chrooted environment and we have things like network access working properly. It’s now that we get a chance to actually do some really cool stuff.

So in the lines above, we update our package libraries, export a few environment variables to avoid problems with locales and to allow the importing of GPG keys and get onto some real nitty gritty. We install 4 packages, one of the most important of these is casper, which is explained in greater detail in the popout. This is required for making the read-only environment temporarily writeable whilst the Live CD is running. The next two applications are used for detecting hardware in various situations. The last, is the most important of all. It’s the linux kernel. Without it, we can’t boot the system at all. More information on the Kernel is available in the popout.

So now we’re at the stage where we can install new packages. In our example, we’re going to install apache so that we could use it to demonstrate our new web application. To do this, we’re going to run the following set of commands. These two lines should be fairly self explanitory. The last one cleans up any files left over from the apt processes.

apt-get install --yes apache2 php5
apt-get clean

The filesystem is complete, and we can begin thinking about packaging up our CD. If you want to, you can actually test that the apache installation is working. You’ll have to either modify the ports.conf or ensure that you don’t have any other web server running on your real PC. If you need to, just run something like /etc/init.d/apache2 restart

So we should have a complete filesystem now. The next step is to back out of the chroot and begin the CD building process. Note that you should stop any running processes like apache before continuing. We are going to remove all temporary files, delete the resolv.conf file as stated earlier and unmount the special file systems.

We need a few more tools, so if you don’t have them already, grab; syslinux, squashfs-tools, mkisofs and sbm. Let’s now make a few more directories within which we can start to build the iso image contents.

Populating these directories, we first need to copy across the kernel and initrd. These files were generated when we installed the linux-image-generic earlier. Since the booting process takes place outside of the filesystem, these files are required to be in a different place. As you can see, we also copy across the isolinux binary file, which actually handles the initial boot process from the iso image on the CD. For good measure, we also include memtest and sbm.

It’s a good idea to create a text file here to give the user a few instructions. In this case, since we are creating a simple bootable CD, the user is required to enter some text, without these instructions the user will be left clueless. So we’ll create a file in image/isolinux, called isolinux.txt, and fill it with the line “Please type “live” and then press enter. This should be enough to get people started. You can customise this text file, and even put a splash boot image here. However for the purpose of this introduction to live CD creation, we’ll skip these bits.

We now need a config file for isolinux, which will be called isolinux.cfg and will be placed in the previous directory. You should fill it with something similar to the following. Notice our reference to isolinux, the initrd.gz and vmlinuz from earlier.

In order to make the CD as small as possible, we use squashfs. Squashfs is a read-only compressed file system. It’s used in many of the mainstream linux distributions on their live cds. The problem is, it is just that, Read-Only. Booting our live CD with this will be pretty pointless as we won’t be able to make any changes to the filesystem at all. No logging, no config changes, no nothing. It’s a bit of a problem, but one which is solved quite nicely by a union mount filesystem. Two examples of this are unionfs and aufs. Both provide a hidden layer between the user and the read-only file system allowing changes to be made to files. These changes are stored in memory. In essence if a file hasn’t been modified it is read straight from the disk, however if modifications have taken place, the read-only version and the changes from memory are merged together to form a combination of the two. What results is an up to date version of the file, reflecting the most recent changes by the user.

Since this CD will only be a live CD, we are going to creat the squashfs file and exclude the boot folder, as it will only take up unnecessary space.

After this we create another folder in image called ubuntu. The last two steps are coming up. First, we create a list of each file on the CD and with it associate it’s md5sum. This is skippable, but is useful for verification purposes.

We reach the final step in our CD building process, creating the actual iso. This file will be appropriate to burn to a CD, or to test using something like virtualbox. Looking at the command below you may be thinking why doesn’t the chroot directory appear at all. If you remember we created a squashfs archive of everything in that filesystem and copied it into the casper directory. The options in the last line need no deep explanation, they are quoted in just about any tutorial on creating live CDs. If you feel like delving deeper, please read the mkisofs man page for more information.

So there you have it. A fully working live CD with apache2 and php5 installed. If you need to do anymore work on the CD, you can jump back into the chroot environment again. Once finished, you just need to recreate the squashfs, remembering to remove the old one first else it’ll segfault, and if you installed anymore packages; recreate the manifest files too. Then just rebuild the iso and you’re good to go.

It should be noted that if you do want to test your apache installation in virtual box, you will need to do some investigation about the way virtualbox networking works, and setup some virtualbox port forwarding. Also, apache doesn’t work very well with unionfs in it’s default configuration, you will need to disable the “Sendfile” directive, and possibly the MMap one too.

Well it’s been a fairly long haul, but what we’ve achieved is actually pretty impressive. There are many things you can do to customise this further, and although it wasn’t totally from scratch, it wasn’t far off. What it did allow us to do, was to start with a very small footprint, almost nothing in fact, and build it up into something usable.