Introducing the 2.6 Kernel

Scheduler and audio improvements are only two of the features you'll notice when the 2.5 development series becomes 2.6. Here's a kernel hacker's view into the near future.

The kernel has come a long way since
Linus branched off 2.4.15 to create 2.5.0 back on November 22,
2001. Since then, rapid development has ensued, resulting in a
drastically different and much-improved kernel. This article
discusses the more interesting, important features and their impact
on the performance and reliability of Linux.

History of 2.5 Thus Far

In Linux kernel parlance, the minor version number of a
kernel denotes whether that kernel belongs to a stable series or a
development series. Even minor version numbers denote stable
kernels, and odd minor version numbers denote development kernels.
When a development kernel is fully mature and deemed stable, its
minor version number is incremented to an even value. For example,
the 2.3 kernel development series gave way to the 2.4 stable
series.

The current development kernel is 2.5. The initial work on a
development series is quite brisk, and many new features and
improvements are incorporated. When Linus and the kernel developers
are satisfied with the new feature set, a feature-freeze is
declared, which has the purpose of slowing development. The last
feature-freeze occurred on October 31, 2002. Ideally, when a
feature-freeze is declared, Linus will not accept new
features—only additions to existing work. When the existing
features are complete and nearly stable, a code-freeze is declared.
During a code-freeze, only bug fixes are accepted, in order to
prepare the kernel for a stable release.

When the development series is complete, Linus releases the
kernel as stable. This time around, the stable kernel most likely
will be version 2.6.0. Although the official release date is “when
it is done”, a good estimate is third or fourth quarter
2003.

In March 2001 and again in June 2002, the core kernel
developers met at Kernel Summits to discuss the kernel. The primary
goal of 2.5 was to bring the aging block layer (the part of the
kernel responsible for block devices, such as hard drives) into the
21st century. Other concerns centered on scalability, system
response and virtual memory (VM). The kernel hackers met all—and
many more—of these goals. The list of important new features
includes:

O(1) scheduler

preemptive kernel

latency improvements

redesigned block layer

improved VM subsystem

improved threading support

new sound layer

In this article, I discuss a lot of new technology and
innovation that has gone into the 2.5 kernel and will appear in
2.6. The development is the result of hard work from many
individuals. I am going to refrain from mentioning names, because
if I start giving credit I inevitably will miss some people, and I
would rather give no list than an incomplete or incorrect one. The
Linux Kernel Mailing List archive is a good source of who did
what.

O(1) Scheduler

The process scheduler (or, simply, the scheduler) is the
subsystem of the kernel responsible for allocating processor time.
It decides which process gets to run when. This is not always an
easy job. From a possibly large list of processes, the scheduler
must ensure that the most worthy one is always running. When there
is a large number of runnable processes, selecting the best process
may take some time. Machines with multiple processors only add to
the challenge.

Improvements to the scheduler ranked high on the list of
needed improvements. Specifically, developers had three specific
goals:

The scheduler should provide full O(1) scheduling.
Every algorithm in the scheduler should complete in constant time,
regardless of the number of running processes.

The scheduler should have perfect SMP scalability.
Ideally, each processor should have its own locking and individual
runqueue. A runqueue is the list of runnable processes from which
the scheduler chooses.

The scheduler should have improved SMP affinity. It
should naturally attempt to group tasks on a specific CPU and run
them there. It should migrate tasks from one CPU to another only to
resolve imbalances in runqueue length.

The new scheduler accomplishes all of these goals. The first
goal was full O(1) scheduling. O(1) denotes an algorithm that
executes in constant (fixed) time. The number of runnable tasks on
a system—or any other variable, for that matter—has no bearing on
the time to execute any part of the scheduler. Consider the
algorithm for deciding which task should run next. This job
involves looking at the highest priority task, with timeslice
remaining, that is runnable. In the previous scheduler, the
algorithm was analogous to:

for (each runnable process on the system) {
find worthiness of this process
if (this is the worthiest process yet) {
remember it
}
}
run the most worthy process

With this algorithm, the worthiness of each process must be
checked. This implies the algorithm loops
n-times for n processes.
Hence, this is an O(n) algorithm—it scales
linearly with the number of processes.

Conversely, the new scheduler is constant with respect to the
number of processes; it does not matter whether there are five or
5,000 runnable processes on the system. It always takes the same
amount of time to select and begin executing a new process:

get the highest priority level that has processes
get first process in the list at that priority level
run this process

In this algorithm, it is possible to simply “get the highest
priority level” and “get first process in the list”, because the
scheduler keeps track of these things. It simply has to look up,
instead of search for, these values. Consequently, the new
scheduler can select the next process to schedule without looping
over all runnable processes.

The second goal is perfect SMP scalability. This implies the
performance of the scheduler on a given processor remains the same
as one adds more processors to the system, which was not the case
with the previous scheduler. Instead, the performance of the
scheduler degraded as the number of processors increased, due to
lock contention. The overhead of keeping the scheduler and all of
its data structures consistent is reasonably high, and the largest
source of this contention was the runqueue. To ensure that only one
processor can concurrently manipulate the runqueue, it is protected
by a lock. This means, effectively, only one processor can execute
the scheduler concurrently.

To solve this problem, the new scheduler divides the single
global runqueue into a unique runqueue per processor. This design
is often called a multiqueue scheduler. Each processor's runqueue
has a separate selection of the runnable tasks on a system. When a
specific processor executes the scheduler, it selects only from its
runqueue. Consequently, the runqueues receive much less contention,
and performance does not degrade as the number of processors in the
system increases. Figure 1 is an example of a dual-processor
machine with a global runqueue vs. a dual-processor machine with
per-processor runqueues.

Figure 1. Left, the 2.4 Runqueue; Right, the 2.5/2.6 Runqueue

The third and final goal was improved SMP affinity. The
previous Linux scheduler had the undesirable characteristic of
bouncing processes between multiple processors. Developers call
this behavior the ping-pong effect. Table 1 demonstrates this
effect at its worst.

The new scheduler solves this problem, thanks to the new
per-processor runqueues. Because each processor has a unique list
of runnable processes, processes remain on the same processor.
Table 2 shows an example of this improved behavior. Of course,
sometimes processes do need to move from one processor to another,
like when an imbalance in the number of processes on each processor
occurs. In this case, a special load balancer mechanism migrates
processes to even out the runqueues. This action occurs relatively
infrequently, so SMP affinity is well preserved.

Trending Topics

Webinar: 8 Signs You’re Beyond Cron

Scheduling Crontabs With an Enterprise Scheduler
11am CDT, April 29th

Join Linux Journal and Pat Cameron, Director of Automation Technology at HelpSystems, as they discuss the eight primary advantages of moving beyond cron job scheduling. In this webinar, you’ll learn about integrating cron with an enterprise scheduler.