The current kernel release is still 2.4.2. The current prepatch is
2.4.3pre6, released early in the morning on March 21. The patch log file is, as of this writing, only updated to
2.4.3pre5, however.

No 2.2.19 prepatches have been released this week.

Changing the memory map semaphore. One of the changes that is now
in the 2.4.3 prepatch is a new memory map locking scheme implemented by Rik
van Riel. The memory map semaphore controls access to the various virtual
memory areas and page tables used by a process; it is intended to keep
concurrent activities, such as page faults, memory map changes, and
informational queries from stepping on each other. It is a fundamental
part of how the virtual memory system works.

It also, seemingly, is a performance problem. For example programs that
use the /proc interface to get process information can find
themselves blocked for long periods of time. Page faults, too, can be
slowed down, even when they occur in different places and should not
conflict with each other. Multi-threaded programs, such as the MySQL
server or Apache 2.0, are restricted to handling just one page fault at a
time across the whole set of threads. In some cases, this restriction can
lead to very poor performance.

Rik's change is to turn the memory map semaphore into a variant known as a
reader-writer semaphore (or R/W semaphore). These semaphores allow
multiple threads to access a common data structure simultaneously, as long
as none of them make any changes. Once somebody needs to change things, it
must wait until all of the readers have finished their business, then lock
them out for the duration of the change.

An R/W semaphore suits this situation well, since both the /proc
and page fault cases do not actually need to change the memory map. With
the change applied, the system can do more things simultaneously. Even on
uniprocessor systems, things will work better, since work need not wait for
the resolution of a page fault, which can involve disk activity.

It's also a relatively fundamental and scary change for a stable kernel
release. Even Linus, while accepting the
change, is a little nervous about it:

I'm applying this to my tree - I'm not exactly comfortable with
this during the 2.4.x timeframe, but at the same time I'm even less
comfortable with the current alternative, which is to make the
regular semaphores fairer (we tried it once, and the implementation
had problems, I'm not going to try that again during 2.4.x).

The patch also, as of 2.4.3pre5, "has only been
tested on i386 without PAE, and is known to break other architectures."
There have been some good reports, though, on the performance effects of
this patch. But it may mean that the real 2.4.3 will not be out for a
while yet, since Linus will want to give it some time to stabilize and
prove that everything works.

Global kernel analysis. Dawson Engler, at Stanford, has put
together an extension to the gcc compiler which allows it to perform
detailed, global analysis of a body of code and point out a number of
possible bugs. Over the last week, he and his students have been posting
the results of this work. They have found some impressive things,
including:

Places where pointers are interpreted as user-space addresses (i.e.
they are passed to a function like copy_to_user), but where
the same pointer is also dereferenced directly (nine cases). Kernel code running in
process context can generally get away with that sort of reference,
but it's risky for a few reasons. The user-space address may not be
valid (or the page could have been swapped out since the kernel last
checked), and there are security implications as well.

Large variables on the kernel stack (22
cases, plus a few more when devfs is
used). The kernel stack is limited in size, and putting large
variables there risks overflowing the allocated space.

Various locking bugs (16 cases). These
include paths that could take out a lock and forget to unlock it, and
potential misuse of the processor state flags.

Inconsistent treatment of interrupts (28
cases). Code that sometimes runs with interrupts enabled and
other times not is likely to be buggy; functions which sometimes
forget to reenable interrupts certainly are.

Places where a pointer returned by a function that can fail is not
checked (120 cases).

Calls to functions that can block while interrupts are disabled or
spinlocks are held (163 cases). Kernel
code, of course, should not block in either case, or serious
performance problems (or deadlocks) can result.

The response from the kernel hackers has been quite positive, for one
simple reason: quite a few new bugs have been found. Many of the things
being tested for are the sort of subtle bug that can be very easy to create
and hard to track down.

The tool that is doing this work is called "MC" ("meta-level compilation");
it was created by a team headed by Mr. Engler and sponsored by DARPA grant
MDA904-98-C-A933. MC defines an extension language for gcc called "metal,"
which can be used to program specific checks to be applied to the code.
Here, for example, is a piece of code which looks for errors in enabling
and disabling interrupts:

Those who are interested in MC should check out Mr. Engler's paper
"Checking system rules using system-specific, programmer-written
compiler extensions," which is available on the net in PostScript
format. The code fragment above was taken from that paper. Please
don't bug Mr. Engler about obtaining the code, however; the system is still
under development and has not yet been generally released. In time,
however, it should become part of the standard kernel hacker's toolkit.

JFFS2 released.
The folks at Red Hat have announced the
release of the JFFS2 filesystem. It's a complete reimplementation of
Axis Communications' Journaling Flash Filesystem, with a number of
improvements. It's available via CVS, and only works with the 2.4
kernel. An iPAQ kernel with JFFS2 built in is available as well.

Help out the kernel manual pages. Andries Brouwer has released
man-pages-1.35. In the announcement, he notes:

David Mosberger expressed his worry that especially man page
Section 2 is out-dated and x86 specific, with no indication that
other architectures even exist. No doubt he is right.

So the request has gone out: please point out the man pages that are wrong,
and, if possible, supply fixes while you're at it. This is a good way for
people to help out without having to actually hack on the kernel code.

FSM's kernel patch. Kernel patches do not normally come with press
releases, or, at least, they didn't. This week, FSMLabs (the RTLinux
company) announced that it had released a
memory management patch. It seems that a memory management change in 2.4
creates some difficulties for RTLinux, so they went and developed a fix.
And announced it to the world.

The patch itself is quite small, especially
considering that the one real chunk of code there is lifted the MIPS
version of <asm/pgalloc.h>. It adds a couple of big kernel
lock invocations, and a function which propagates page directory changes
across processes and CPUs. That's evidently enough to restore low latency
on a reliable basis for real-time tasks.