Booting the Kernel

This article is a description of the steps required to boot the Linux kernel. While this kind of information is not relevant to the system's functionality, it is interesting to see how the different architectures bring up the system.

A computer system is a complex machine,
and the operating system is an elaborate tool that orchestrates
hardware complexities to show a simple and standardized environment
to the end user. When the power is turned on, however, the system
software must boot the kernel and work in a limited operating
environment. I describe here the booting process of three
platforms: the old-fashioned PC and the more fully featured Alpha
and SPARC platforms. The PC is covered in more detail, since it is
still in more widespread use than other platforms, and also because
it's the most tricky platform to bring up. No code will be shown,
as assembly language is unintelligible to most readers, and each
platform has its own.

The Computer at Power-On

In order to be able to use the computer when the power is
turned on, the processor begins execution from the system's
firmware. The firmware is “unmovable software” found in ROM; some
manufacturers call it the Basic Input-Output System (BIOS) to
underline its software role, some call it PROM or “flash” to
stress its hardware implementation, while others call it
“console” to focus on user interaction.

The firmware usually checks the hardware's functionality,
retrieves part (or all) of the kernel from a storage medium and
executes it. This first part of the kernel must load the rest of
itself and initialize the whole system. I don't deal with firmware
issues here with the kernel code, which is distributed with
Linux.

The PC

When the x86 processor is turned on, it is a 16-bit processor
that sees only 1MB of RAM. This environment is known as “real
mode” and is dictated by compatibility with older processors of
the same family. Everything that makes up a complete system must
live within the available megabyte of address space, i.e., the
firmware, video buffers, space for expansion boards and a little
RAM (the infamous 640KB) must all be there.

To make things difficult, the PC firmware loads only half a
kilobyte of code and establishes its own memory layout before
loading this first sector. Whatever the boot media, the first
sector of the boot partition is loaded into memory at the address
0x7c00, where execution begins. What happens at 0x7c00 depends on
the boot loader being used; we examine three situations here: no
boot-loader, LILO, Loadlin.

Booting zImage and bzImage

Even though it's rare to boot the system without a boot
loader, it is still possible to do so by copying the raw kernel to
a floppy disk. The command cat zImage >
/dev/fd0 works perfectly on Linux, although some other
Unix systems can do the task reliably only by using the
dd command. Without going into detail, the raw
floppy image created by zImage can then be
configured using the rdev program.

The file called zImage is the compressed
kernel image that resides in arch/i386/boot
after either make zImage or make
boot is executed—the latter invocation is the one I
prefer, as it works unchanged on other platforms. If you built a
“big zImage” instead, the kernel file created
is called bzImage and resides in the same
directory.

Booting an x86 kernel is a tricky task because of the limited
amount of available memory. The Linux kernel tries to maximize
usage of the low 640 kilobytes by moving itself around several
times. Let's look at the steps performed by a
zImage kernel in detail; all of the following
path names are relative to the arch/i386/boot directory.

The first sector (executing at 0x7c00) moves itself
to 0x90000 and loads subsequent sectors after itself, getting them
from the boot device using the firmware's functions to access the
disk. The rest of the kernel is then loaded to address 0x10000,
allowing for a maximum size of half a megabyte of data—remember,
this is the compressed image. The boot sector code lives in
bootsect.S, a real-mode assembly file.

Then code at 0x90200 (defined in setup.S) takes
care of some hardware initialization and allows the default text
mode (video.S) to be changed. Text mode selection is a compile-time
option from 2.1.9 onwards.

Later, all the kernel is moved from 0x10000 (64K)
to 0x1000 (4K). This move overwrites BIOS data stored in RAM, so
BIOS calls can no longer be performed. The first physical page is
not touched because it is the so-called “zero-page”, used in
handling virtual memory.

At this point, setup.S enters protected mode and
jumps to 0x1000, where the kernel lives. All the available memory
can be accessed now, and the system can begin to run.

The steps just described were once the whole story of booting
when the kernel was small enough to fit in half a megabyte of
memory—the address range between 0x10000 and 0x90000. As features
were added to the system, the kernel became larger than half a
megabyte and could no longer be moved to 0x1000. Thus, code at
0x1000 is no longer the Linux kernel, instead the “gunzip” part
of the gzip program resides at that address. The
following additional steps are now needed to uncompress the kernel
and execute it:

head.S in the compressed directory is at 0x1000,
and is in charge of “gunzipping” the kernel; it calls the
function decompress_kernel, defined in
compressed/misc.c, which in turns calls
inflate which writes its output starting at
address 0x100000 (1MB). High memory can now be accessed, because
the processor is definitely out of its limited boot
environment—the “real” mode.

After decompression, head.S jumps to the actual
beginning of the kernel. The relevant code is in ../kernel/head.S,
outside of the boot directory.

The boot process is now over, and head.S (i.e., the code
found at 0x100000 that used to be at 0x1000 before introducing
compressed boots) can complete processor initialization and call
start_kernel(). Code for all functions after
this step is written in C.

The various data movements performed at system boot are
depicted in Figure 1.

Figure 1. System Boot Data Map

The boot steps shown above rely on the assumption that the
compressed kernel can fit in half a megabyte of space. While this
is true most of the time, a system stuffed with device drivers
might not fit into this space. For example, kernels used in
installation disks can easily outgrow the available space. Some new
method is needed to solve the problem—this new method is called
bzImage and was introduced in kernel version
1.3.73.

A bzImage is generated by issuing
make bzImage from the top level Linux source
directory. This kind of kernel image boots similarly to
zImage, with a few changes:

When the system is loaded to address 0x10000, a
little helper routine is called after loading each 64K data block.
The helper routine moves the data block to high memory by using a
special BIOS call. Only the newer BIOS versions implement this
functionality, and so, make boot still builds
the conventional zImage, though this may change
in the near future.

setup.S doesn't move the system back to 0x1000 (4K)
but, after entering protected mode, jumps instead directly to
address 0x100000 (1MB) where data has been moved by the BIOS in the
previous step.

The decompresser found at 1MB writes the
uncompressed kernel image into low memory until it is exhausted,
and then into high memory after the compressed image. The two
pieces are then reassembled to the address 0x100000 (1MB). Several
memory moves are needed to perform the task correctly.

The rule for building the big compressed image can be read
from Makefile; it affects several files in arch/i386/boot. One good
point of bzImage is that when kernel/head.S is
called, it doesn't notice the extra work, and everything goes
forward as usual.

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.