Navegación de entradas

Removing All Syscall Invocations from Kernel Space

There's an effort under way to reduce and ultimately remove all system call
invocations from within kernel space. Dominik Brodowski was leading this
effort, and he posted some patches to remove a lot of instances from the
kernel. Among other things, he said, these patches would make it easier to
clean up and optimize the syscall entry points, and also
easier to clean up the
parts of the kernel that still needed to pretend to be in userspace, just
so they could keep using syscalls.

The rationale behind these patches, as expressed by Andy
Lutomirski,
ultimately was to prevent user code from ever gaining access to kernel memory.
Sharing syscalls between kernel space and user space made that impossible
at the moment. Andy hoped the patches would go into the kernel quickly,
without needing to wait for further cleanup.

Linus Torvalds had absolutely no criticism of these patches,
and he indicated
that this was a well desired change. He offered to do a little extra
housekeeping himself with the kernel release schedule to make Dominik's
tasks easier. Linus also agreed with Andy that any cleanup effort could
wait—he didn't mind accepting ugly patches to update the syscall calling
conventions first, and then accept the cleanup patches later.

Ingo Molnar predicted that with Dominik's changes, the size of the compiled
kernel would decrease—always a good thing. But Dominik said no, and in
fact
he ran some quick numbers for Ingo and found that with his patches, the
compiled kernel was actually a few bytes larger. Ingo was surprised but not
mortified, saying the slight size increase would not be a showstopper.

This project is similar—although maybe smaller in scope—to the effort
to get rid of the big kernel lock (BKL). In the case of the BKL, no one
could figure out for years even how to begin to replace it, until finally
folks decided to convert all BKL instances into identical local
implementations that could be replaced piecemeal with more specialized and
less heavyweight locks. After that, it was just a question of slogging
through each one until finally even the most finicky instances were
replaced with more specialized locking code.

Dominik seems to be using a similar technique now, in which areas of the
kernel that still need syscalls can masquerade as user space, while areas
of the kernel that are easier to fix get cleaned up first.

Note: if you're mentioned above and want to post a response above the comment section, send a message with your response text to ljeditor@linuxjournal.com.