Kernel Secrets

Abstract:

This articles is a short description of the Linux Kernel.

_________________ _________________ _________________

Presentation

Welcome to the first of a series of articles about the Linux
kernel secrets. Probably you already took a look at the kernel
sources some time in the past. In that case you noticed that
the initial couple of 100-kb compressed files has turned into
more than 300 files containing more than 2 million source code
lines, and taking as many as 9 Megabytes of compressed
storage.

This series is intended not for newbies but advanced
programmers. Obviously you're free to read it anyway, and the
author will do his best to answer any question or doubt you
send through e-mail.

New bugs are discovered and new patches are published mostly
every day. Nowadays it's mostly impossible to understand the
source code in a whole. It's co-written by lots of different
programmers who try to keep an homogeneous coding style, but in
fact it differs from each other.

Linux: The Internet Operating
System

Linux is a freely distributable operating system for PC
architecture and others. It's compatible with the POSIX 1003.1
standard and includes a large number of features from Unix
System V and BSD 4.3. Many substantial parts of the Linux
kernel this series is writing about, were written by Linus
Torvalds, a Finish computer science student. The first kernel
was released on November, 1991.

Main Features

Linux solves mostly all needs of a current Unix-based operating
system:

Multitasking

Linux supports true multitasking. All processes are
independent. None of them must release the processor to
execute other process.

Multiuser accessibility

Linux is not only a multiuser operating system, but also
has multiuser accessibility. Linux is able to share the
same system resources among users connected through
different terminals attached to the host.

Executables loaded on demand

Only needed parts of a program are loaded into memory to
be executed.

Memory pagination

If the system memory is fully exhausted, Linux will then
search fo r 4K-sieLinux entoncesd memory pages to be
released from memory and stored on the hard disk. If any of
these pages is required again, Linux will restored it from
disk into its original memory location. Old unix systems
and some current platforms, including Microsoft Windows,
memory is swapped into disk. That means that all memory
pages belonging to a task are saved on disk when there is a
memory shortage, but this is less efficient.

Dynamic disk cache

MSDOS users are used to work with SmartDrive, a program
which reserves some fixed area of the system memory for
disk caching. Linux instead has a lot more dynamic disk
caching system: reserved memory for cache is enlarged when
memory is unused, or shrinked as needed when system or
users processes demand more memory.

Shared libraries

Libraries are sets of routines used by programs to
process data. There is a number of standard libraries used
from more than one process at the same time. These
libraries are included onto every executable file in old
systems, and loaded redundantly into memory everytime a new
process using is the same library is executed, so spending
more memory space. compartida. In modern systems like
Linux, shared code is loaded just once, and shared among
all processes that use it.

Standard POSIX 1003.1 100% compliant. Some System V and
BSD features supported.

POSIX 1003.1 defines an standard interface for Unix
operating systems.This interface is described as a set of C
routines, and is currently supported by all modern
operating systems. Microsoft Windows NT has support for
POSIX 1003.1. Linux 1.2 is 100% compliant with POSIX.
Additionally, some System V and BSD interfaces are
supported or being implemented for further
compatibility.

Several executable file formats

Who would not like to run any DOS, Windows95, FreeBSD or
OS/2 application under Linux? So DOS, Windows and Windows95
emulators are under development. Linux is also able to run
binaries from other intel-based Unix platforms compliant
with the iBCS2 (intel Binary Compatibility) standard.

Several filesystem formats

Linux support a large number of file system formats. The
most commonly used format used nowadays is the Second
Extended File System (Ext2). Another supported file system
format is the File Allocation Table (FAT) used by DOS-based
systems, but FAT is not ready for security or multiuser
access due to its design restrictions.

Networking

Linux is able to be integrated into any local area
network. Any unix service is supported, including Networked
File System (NFS), remote login (telnet, rlogin), dial-up
SLIP and PPP, and so on. Integration as server or client
for other networks is also supported, including filesharing
and printing in Macintosh, Netware and Windows.

Compiling the Kernel

Let's take a look at the kernel source code before studying
the kernel itself.

Source tree structure: Linux kernel sources are
commonly located under the /usr/src/linux directory,
so we'll mention directories as relative to this location. As a
result of the porting to non-Intel architectures, the kernel
tree was changed after version 1.0. Architecture-dependent code
is located under the arch/ hierarchy. Code for Intel
386, 486, Pentium and Pentium Pro processors are under
arch/i386. The arch/mips directory is for
MIPS-based systems, arch/sparc for Sun Sparc-based
platforms, arch/ppc for
PowerPC/Powermacintosh systems, and so on. We'll
concentrate on the Intel architecture as this is the most
widely used with Linux.

The Linux kernel is just an standard C program. There are
only two important differences. The starting point for programs
written in the C language is the main(int argc,char
**argv) routine. Linux kernel uses
start_kernel(void). The program environment does not
exist yet when the system is starting up and the kernel is to
be loaded. This means that a couple of things are to be done
before the first C routine is called. The asembler code that
perform this task is located under the arch/i386/asm/
directory.

The appropiate assembler routine loads the kernel into the
absolute 0x100000 (1 Mbyte) memory address, then installs the
interrupt servicing routines, global file descriptor tables and
interrupt descriptor tables, that are exclusively used during
the initialization process. At this point, the processor is
turned into protected mode. The init/ directory
contains everything you need to initialize the kernel. Here is
the start_kernel() routine, dedicated to initialize
the kernel properly, taking in consideration all passed boot
parameters. The first process is created without using system
calls (system itself is not loaded yet). This is the famous
idle process, the one which uses processor time when not used
by any other process.

The kernel/ and arch/i386/kernel/
directories contain, as suggested by their path names, the main
parts of the kernel. Here is where main system calls are
located. Here are implemented other tasks including the time
handler, the scheduler, the DMA manager, the interrupt handler
and the signal controller.

Code handling system memory is located in mm/ and
arch/i386/mm/. This area is devoted to the memory
assignation and release for processes. Memory paging is also
implemented here.

The Virtual File System (vfs) is under the fs/
directory. Different supported file system formats are located
in different subdirectories respectively. The most important
file systems are Ext2 y Proc. We'll take a detailed look at
later them later.

All operating systems require a set of drivers for hardware
components. In the Linux kernel, these are located under
drivers/.

Under ipc/ you will find the Linux implementation
of the System V IPC.

Source code to implement several network protocols, sockets
and internet domains is stored under net/.

Some standard C routines are implemented in lib/,
enabling the kernel itself to use C programming habits.

Loadable modules generated during the kernel compilation are
saved in modules/, but it's empty until the first
kernel compilation is done.

Probably the most important directory used by programmers is
include/. Here you find all C header files
specifically used by the kernel. Specific kernel header files
for intel platforms are under include/asm-386/

Compiling: A new kernel is basically generated in
just three steps:

First of all, configuring kernel customizable options
with "make config", "make menuconfig" or "make xconfig"
(different interfaces for the same configuring stage)

Then, all source code dependencies are rearranged with
"make depend"

and then the real kernel compilation is performed with
"make"

We will get on details about the backgrounds for these
scripts and how to modify them to introduce new configuration
options in next articles.

I hope you enjoyed this article. You're free to email your
comments, sugestions and criticisms to
elesende@nextwork.net.