The Linux Process Model

This month, we begin looking at Linux
internals. We will travel the innards of the Linux kernels of the
2.0.x, 2.2.x and the new
2.4.x series. Although many articles are
written every week on how best to use Linux, very few have reviewed
the internals of the kernel. Why is it necessary to know how the
kernel works?

For one thing, understanding your kernel better will enable
you to prevent problems before they occur. If you are using Linux
as a server, most problems will start to appear under stress. This
is exactly when it becomes essential to know your way around the
kernel to assess the nature of the problems.

If you ever need to check back with the kernel source, you
can either install the source from your distribution's CD or go to
http://lxr.linux.no/source/
to navigate through all the source code on-line.

The Linux Process Model

UNIX systems have a fundamental building block: the process,
including the thread and lightweight processes. Under Linux, the
process model has evolved considerably with each new
version.

The fundamental data structure within the kernel controlling
all processes is the process structure, which grows and shrinks
dynamically as processes are forked and finished or killed.

The process structure (called task_struct in the kernel
source code) is about 1KB in size. You can get the exact size with
this program:

On Intel 386 machines, it is exactly 960 bytes. Please note,
however, that unlike other UNIX systems, this process structure
does not occupy space in the true sense of the word.

Since 2.2.x, the task_struct is
allocated at the bottom of the kernel stack. We can overlap the
task_struct on the kernel stack because the task_struct is a
per-task structure exactly as the kernel stack.

The kernel stack has a fixed size of
8192 bytes on the Intel x86. If the kernel will recurse on the
stack for 8192-960=7232 bytes, then the task_struct will be
overwritten and therefore corrupted, causing the kernel to
crash.

Basically, the kernel decreases the size of the
usable kernel stack to around 7232 bytes by
allocating the task structure at the bottom of the stack. It is
done this way, because 7KB are more than enough for the kernel
stack and the rest is used for the task_struct. These are the
advantages of this order:

The kernel doesn't have to access memory to get its
kernel structure.

Memory usage is reduced.

An additional dynamic allocation is avoided at task
creation time.

The task_struct will always start on a PAGE_SIZE
boundary, so the cache line is always aligned on most hardware in
the market.

Once Linux is in kernel mode, you can get the address of the
task_struct at any time with this very fast pseudo-code:

task_struct = (struct task_struct *) STACK_POINTER & 0xffffe000;

This is exactly how the above pseudo-code is implemented in C under
Linux:

For example, on a Pentium II, recalculating the task_struct
beginning from the stack pointer is much faster than passing the
task_struct address through the stack across function calls, as is
done in some other operating systems, e.g., Solaris 7. That is, the
kernel can derive the address of the task_struct by checking only
the value of the stack pointer (no memory accesses at all). This is
a big performance booster and shows once again that fine
engineering can be found in free software. The code to this was
written by Ingo Molnar, a Hungarian kernel hacker. The kernel stack
is set by the CPU automatically when entering kernel mode by
loading the kernel stack pointer address from the CPU Task Segment
State that is set at fork time.

The layout of the x86 kernel stack looks like this:

----- 0xXXXX0000 (bottom of the stack and address
of the task struct)
TASK_STRUCT
----- 0xXXXX03C0 (last byte usable from the kernel
as real kernel stack)
KERNEL_STACK
----- 0xXXXX2000 (top of the stack, first byte
used as kernel stack)

Note that today, the size of the task_struct is exactly 960
bytes. It is going to change across kernel revisions, because every
variable removed or inserted to the task_struct will change the
size. In turn, the upper limit of the kernel stack will change with
the size of task_struct.

The memory for the process data structure is allocated
dynamically during execution of the Linux kernel. More precisely,
the kernel doesn't allocate the task_struct at all, only the
two-pages-wide kernel stack of which task_struct will be a
part.

In many UNIX systems, there is a maximum processes parameter
for the kernel. In commercial operating systems like Solaris, it is
a self-tuned parameter. In other words, it adjusts according to the
amount of RAM found at boot time. However, in Solaris, you can
still adjust this parameter in /etc/system.