Asymetric Multiprocessing for Linux

Processor Groups Linux Kernel Patch

THISISWAAAYOUTDATED!. It’s basically just task and irq processor affinity, which can now be handled by schedutils and careful proc/irq/#/irq_affinity handling. I’m just keeping this here for historical reasons. — john.c (10/16/2005)

This was one of my first forays into seeing what could be done with an OS
scheduler to take advantage of SMP and NUMA multiprocessor environments to
help speed up numerical computations. It is an ugly, ugly hack that was more
of a proof of concept. It did work, but the benefits gained are very small,
expecially on an SMP machine. I made this patch around the 2.4.0-test days of
the Linux kernel. It allowed for, at compile time, an administrator to specify
a certain number of processors in a multiprocessor systems to be “just”
application processors (ie, never be tied down with OS tasks). As a side
effect, it also allowed a used to tie a process to run only on a particular
CPU. Most of the work for this has (had) already been done by SGI (and
others). In essence, it would allow users of an SMP machine to use asymmetric multiprocessing.

Benefits of assigning a process to a specific processor

If a process is assigned to only one processor, then it will never have to
‘re-prime’ the cache because it’s been switched to running on a new processor
(and a new cache). For CPU/Memory intensive tasks, this can lead to a small
increase in performance. On a multiprocessor system, we also have the luxury
of modifying the scheduler to allow the task to run uninterrupted on that
processor if we want, because normal machine activity (interrupts, other
processes, login shells, etc..) will continue to be handled by other
processors. This also maximises the application’s use of the cache since we
guarantee no other processes, not even the scheduling quanta, will interrupt
the program.

The Problems

The performance increases for all of these things -only- appear for CPU/Memory bound tasks, which is a very small number of applications. Any I/O waiting will mean that the application processor sits idle while it could be doing other, useful work.

On an SMP machine, you are still forced to sharing the CPU bus and memory bus bandwidth with other processors in the system, so while you get the bebefits of your own processor and cache, you still only get ‘your share’ of the available bandwidth. On a NUMA system, this is a different story, and this patch could be much more useful on a NUMA system where you aren’t necessarily limited by such sharing. Unfortunately, i don’t have a NUMA machine to play with to test out this theory :). Anyonme wishing to donate me one is more than welcome to contact me.

There’s a school of throught, subscribed to by Linus and most other Linux developers out there, that this case would come about naturally if Linux’s scheduler was perfect, and thus we should work on modifying the scheduler to be better rather than coming up with ugly hacks like this. In general, I agree. But I also think there’s nothing wrong with performance being your top goal, and in the interm using hacks such as this to help get your project done.

The Patch

I submitted this patch on the SGI LinuxScalability list. It generated a little
discussion, but noone seemed that interested in general, so I lost interest
for now. And then IBM came out with their Linux Scalability Project and noone
ever really posted to SGI list but me and one guy from SGI. I guess everything
happens on IBM’s list now, I don’t have much time to keep track.

This patch consists of a few parts. First, the kernel patch, which is against
kernel 2.4.0-test6 (I believe). Don’t expect it to work with later kernel’s,
I’ve never tried, and I don’t even have my old PII dual processor machine to
test on anymore. But there should be enough there to figure out what I was
trying to do. That is available here:
procgroup-2.4.0-test6.diff

Second, once you’ve booted your kernel, you can optionally turn off all
interrupts from being routed to your application processors. To do so, go into
/proc/irq/#/irq_affinity and write into each one the bitmask of CPU’s that
interrupt is allowed to be processed on. Write 1 for each OSCPU, and a 0 for
each application CPU.

Finally, you need the “assign2proc” program, whcih is available in source form
here. It’s ugly, it’s a gaping security hole, but it does work. It’s been so
long, I don’t remember the syntax. Use the source, Luke. I’m not going to
touch it again unless there’s some interest generated (by me or someone else).
Here’s the source for assign2proc: assign2proc.c

Conclusions:

After getting this hack up and running I ran a few tests and was wholly
unimpressed with the miniscule speedups gained. And since noone else seemed
interested in the hack, I’ve given up on it. Future work should be
concentrated on optimising the smarts of the current Linux schedular rather
than bastardized hacks such as this. But I’m posting it anyway, even after all
this time, because it was an interesting project, I had fun doing it, and
maybe someone will stumble along it someday and get inspired.

You can visit my poorly maintained homepage as
well… Maybe even look at my resume?

This page was last modified on : _ _

About

I'm a busy man. I'm a father, a husband, a graduate student, a software developer with many, many years of professional experience, an open source advocate, a Linux kernel hacker, an avid baseball fan, a tinkerer, a builder, a reader, a thinker, and a hat wearer.