Lowering Latency in Linux: Introducing a Preemptible Kernel

Whether you seek higher scores in Quake, are an audio enthusiast or want a smoother desktop, lowering latency in the kernel is an important goal.

Performance measurements come in two
flavors, throughput and latency. The former is like the width of an
expressway: the wider the expressway, the more cars that can travel
on it. The latter is like the speed limit: the faster it is, the
sooner cars get from point A to point B. Obviously, both quantities
are important to any task. Many jobs, however, require more of one
quality than of the other. Sticking to our roadway analogy,
long-haul trucking may be more sensitive to throughput, while a
courier service may be more demanding on latency. Lowering latency
and increasing system response, through good old-fashioned kernel
work, is the topic of this article.

Audio/video processing and playback are two common
beneficiaries of lowered latency. Increasingly important to Linux,
however, is its benefit to interactive performance. With high
latency, user actions, such as mouse clicks, go unnoticed for too
long—not the snappy responsive desktop users expect. The system
cannot get to the important process fast enough.

The problem, at least as far as the kernel is concerned, is
the nonpreemptibility of the kernel itself. Normally, if something
sufficiently important happens, such as an interactive event, the
receiving application will get a priority boost and subsequently
find itself running. This is how a preemptively multitasked OS
works. Applications run until they use up some default amount of
time (called a timeslice) or until an important event occurs. The
alternative is cooperative multitasking, where applications must
explicitly say, “I'm done!”, before a new process can run. The
problem, when running in the kernel, is that scheduling is
effectively cooperative.

Applications operate in one of two modes: either in user
space, executing their own code, or within the kernel, executing a
system call or otherwise having the kernel work on their behalf.
When operating in the kernel, the process continues running until
it decides to stop, ignoring timeslices and important events. If a
more important process becomes runnable, it cannot be run until the
current process, if it is in the kernel, gets out. This process can
take hundreds of milliseconds.

Latency Solutions

The first and simplest solution to latency problems is to
rewrite kernel algorithms so that they take a minimal, bounded
amount of time. The problem is that this is already the goal;
system calls are written to return quickly to user space, yet we
still have latency problems. Some algorithms simply do not scale
nicely.

The second solution is to insert explicit scheduling points
throughout the kernel. This approach, taken by the low-latency
patches, finds problem areas in the kernel and inserts code to the
effect of “Anyone need to run? If so, run!” The problem with this
solution is that we cannot possibly hope to find and fix all
problem areas. Nonetheless, testing shows that these patches do a
good job. What we need, however, is not a quick fix but a solution
to the problem itself.

The Preemptible Kernel

A proper solution is removing the problem altogether by
making the kernel preemptible. Thus, if something more important
needs to run, it will run, regardless of what the current process
is doing. The obstacle here, and the reason Linux did not do this
from the start, is that the kernel would need to be re-entrant.
Thankfully, the issues of preemption are solved by existing SMP
(symmetric multiprocessing) support. By taking advantage of the SMP
code, in conjunction with some other simple modifications, the
kernel can be made preemptible.

The programmers at MontaVista provided the initial
implementation of kernel preemption. First, the definition of a
spin lock was modified to include marking a “nonpreemptible”
region. Therefore, we do not preempt while holding a spin lock,
just as we do not enter a locked region concurrently under SMP. Of
course, on uniprocessor systems we do not actually make the spin
locks anything other than the preemption markers. Second, code was
modified to ensure that we do not preempt inside a bottom half or
inside the scheduler itself. Finally, the return from interrupt
code path was modified to reschedule the current process if
needed.

On UP, spin_lock is defined as preempt_disable, and
spin_unlock is defined as preempt_enable. On SMP, they also perform
the normal locking. So what do these new routines do?

The nestable preemption markers preempt_disable and
preempt_enable operate on preempt_count, a new integer stored in
each task_struct. preempt_disable
effectively is:

The result is we do not preempt when the count is greater than
zero. Because spin locks are already in place to protect critical
regions for SMP machines, the preemptible kernel now has its
protection too.

preempt_schedule implements
the entry to schedule itself. It
sets a flag in the current process to signify it was preempted,
calls schedule and, upon return, unsets the flag:

The other entry to preempt_schedule is via the interrupt
return path. When an interrupt handler returns, it checks the
preempt_count and need_resched variables, just as preempt_enable
does (although the interrupt return code in entry.S is in
assembly). The ideal scenario is to cause a preemption here because
it is an interrupt that typically sets need_resched due to a
hardware event. It is not always possible, however, to preempt
immediately off the interrupt, as a lock may be held. That is why
we also check for preemption off preempt_enable.

Comment viewing options

Hmm, this explains why some distros are slow or fast. Lunar Linux allows a Preemptiblle kernel. And it's just lightning fast without any bloat. Hmm, put this option with other distros...and perhaps...*ZOOM ZOOM ZOOM*

Just a rambling thought...anyways, this article comes in handy while deciding on kernel configuration in the unix console.. :p Keep it around!!

One question I always had was when the process is in kernel execution path (system call execution), doesn't scheduler_tick decrement its timeslice, and if it does when the timeslice is over, doesn't the scheduler preempts the current process, even though it might be in the middle of the kernel exec path?

Hi,
I am studying in final year of computer engg at MIT .
I read your extract and you will be glad to know that I have taken the same topic for a technical seminar at our college.I thereby request you to please send me some more technical details and direct me to some inportant links
waiting for your positive reply.
Regards,
Pavan

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.