Kernel Korner - What's New in the 2.6 Scheduler?

When large SMP systems started spending more time scheduling processes than running them, it was time for a change. The new Linux scheduler works well on systems with many processors.

Process Affinity

Imagine two tasks spending a lot of time communicating with each other
over a pipe or bit of shared memory. Some evidence exists that they
might do better if they were on the same processor, even if that
means leaving another idle. If one sends a message to the other and
then waits for a reply, both tend to alternate being runnable
with small overlaps where they are both runnable. In other
words, they don't collide often. The big savings comes from
having the processor cache pre-warmed with the data in the pipe so
it doesn't need to be fetched and copied again to a different
processor's cache. Although processor speeds have been increased,
memory and bus speeds have not kept pace. It's becoming increasingly
painful to have to retrieve data that used to be cached.

Process Size

Moving a task that uses little memory affects a processor
differently from moving a task that uses a great deal of memory.
However, either the large or small task may be the correct choice
depending on the circumstances. If you move a task that uses a lot
of memory away from a processor, leaving behind many small tasks that don't
use much memory, each of those tasks may find a higher percentage
of their memory in cache each time they run. On the other hand,
moving that large task to another processor that has large tasks may
now cause all the tasks to start with a cold cache and negatively
affect the performance of both it and its new neighbors. Current
code does not take process size into account at all.

Device Affinity

For much the same reason as process affinity, there might be times
when it pays to overload a processor with tasks if those tasks are
making heavy use of a particular device. Web servers, for instance,
often have multiple network cards but not enough to have one for each
CPU. Congregating those tasks on processors where the network data
is arriving might prove quite advantageous. Determining which tasks
are likely to use which devices is currently neither obvious nor easy.

Heavy Spikes but Short-Lived Tasks

Shell scripts can cause an explosive number of short-lived tasks,
especially if they don't or can't use built-in functions for string or
number manipulations. Although one could argue these are poorly coded
scripts, they nevertheless have demonstrated that the scheduler can
be too slow in balancing queues on SMP machines. Workloads
with similar behavior also would suffer.

Light but Unbalanced Load

Imagine a program that divides the work into six equal pieces, all
of which ideally finish at the same time. Unfortunately, on a
four-processor machine, two processors tend to take two tasks and
two tend to take one task, and things stay that way. Unless
the scheduler makes an effort to spread the pain around, two jobs
finish more quickly than the other four because they have no competition on
their processor. On the other hand, in most job mixes, moving those
odd jobs around still leaves two tasks on each of two processors.

NUMA

NUMA (non-uniform memory access) presents some interesting
characteristics to worry about. In a NUMA system, it may
be much more expensive to get memory from over there, near another
processor, than
from here, near this processor.
It's not sufficient to have an idle
processor; you need one that is both idle and not too expensive
to migrate to. For instance, it might be bad to migrate a task if
most of the task's resources reside at the current location.
It even might be so bad that it's better to leave it on a busy
processor than to move it to an idle one with a cold cache.

Hyperthreading

Hyperthreading introduces yet another complexity. In hyperthreading,
two processors share cores that overlap in a hardware-dependent way.
Because of this interdependency, jobs running on one processor
can affect the speed of a job running on the other processor.
Although nobody would ever expect a box with four hyperthreading processors to equal
a full eight-processor machine, exactly what to expect varies a great deal
by workload. The only sure thing is it should not yield
less
performance.

Trending Topics

Upcoming Webinar

Getting Started with DevOps - Including New Data on IT Performance from Puppet Labs 2015 State of DevOps Report

August 27, 2015
12:00 PM CDT

DevOps represents a profound change from the way most IT departments have traditionally worked: from siloed teams and high-anxiety releases to everyone collaborating on uneventful and more frequent releases of higher-quality code. It doesn't matter how large or small an organization is, or even whether it's historically slow moving or risk averse — there are ways to adopt DevOps sanely, and get measurable results in just weeks.