EWONTFIX

Broken by design: systemd

09 Feb 2014 19:56:09 GMT

Recently the topic of systemd has come up quite a bit in various
communities in which I'm involved, including the
musl IRC channel and on the Busybox
mailing
list.

While the attitude towards systemd in these communities is largely
negative, much of what I've seen has been either dismissable by folks
in different circles as mere conservatism, or tempered by an idea that
despite its flaws, "the design is sound". This latter view comes with
the notion that systemd's flaws are fixable without scrapping it or
otherwise incurring major costs, and therefore not a major obstacle to
adopting systemd.

My view is that this idea is wrong: systemd is broken by design,
and despite offering highly enticing improvements over legacy init
systems, it also brings major regressions in terms of many of the
areas Linux is expected to excel: security, stability, and not having
to reboot to upgrade your system.

The first big problem: PID 1

On unix systems, PID 1 is special. Orphaned processes (including a
special case: daemons which orphan themselves) get reparented to PID
1. There are also some special signal semantics with respect to PID 1,
and perhaps most importantly, if PID 1 crashes or exits, the whole
system goes down (kernel panic).

Among the reasons systemd wants/needs to run as PID 1 is getting
parenthood of badly-behaved daemons that orphan themselves, preventing
their immediate parent from knowing their PID to signal or wait on
them.

Unfortunately, it also gets the other properties, including bringing
down the whole system when it crashes. This matters because systemd is
complex. A lot more complex than traditional init systems. When I say
complex, I don't mean in a lines-of-code sense. I mean in terms of the
possible inputs and code paths that may be activated at runtime. While
legacy init systems basically deal with no inputs except SIGCHLD
from orphaned processes exiting and manual runlevel changes performed
by the administrator, systemd deals with all sorts of inputs,
including device insertion and removal, changes to mount points and
watched points in the filesystem, and even a public DBus-based API.
These in turn entail resource allocation, file parsing, message
parsing, string handling, and so on. This brings us to:

The second big problem: Attack Surface

On a hardened system without systemd, you have at most one
root-privileged process with any exposed surface: sshd. Everything
else is either running as unprivileged users or does not have any
channel for providing it input except local input from root. Using
systemd then more than doubles the attack surface.

This increased and unreasonable risk is not inherent to systemd's goal
of fixing legacy init. However it is inherent to the systemd design
philosophy of putting everything into the init process.

The third big problem: Reboot to Upgrade

Fundamentally, upgrading should never require rebooting unless the
component being upgraded is the kernel. Even then, for security
updates, it's ideal to have a "hot-patch" that can be applied as a
loadable kernel module to mitigate the security issue until rebooting
with the new kernel is appropriate.

Unfortunately, by moving large amounts of functionality that's likely
to need to be upgraded into PID 1, systemd makes it impossible to
upgrade without rebooting. This leads to "Linux" becoming the
laughing stock of Windows
fans,
as happened with Ubuntu a long time ago.

Possible counter-arguments

With regards to security, one could ask why can't desktop systems
use systemd, and leave server systems to find something else. But I
think this line of reasoning is flawed in at least three ways:

Many of the selling-point features of systemd are server-oriented.
State-of-the-art transaction-style handling of daemon starting and
stopping is not a feature that's useful on desktop systems. The
intended audience for that sort of thing is clearly servers.

The desktop is quickly becoming irrelevant. The future platform is
going to be mobile and is going to be dealing with the reality of
running untrusted applications. While the desktop made the unix
distinction of local user accounts largely irrelevant, the coming of
mobile app ecosystems full of potentially-malicious apps makes "local
security" more important than ever.

The crowd pushing systemd, possibly including its author, is not
content to have systemd be one choice among many. By providing public
APIs intended to be used by other applications, systemd has set itself
up to be difficult not to use once it achieves a certain adoption
threshold.

With regards to upgrades, systemd's systemctl has a
daemon-reexec
command to make systemd serialize its state, re-exec itself, and
continue uninterrupted. This could perhaps be used to switch to a new
version without rebooting. Various programs already use this
technique, such as the IRC client irssi which
lets you /upgrade without dropping any connections. Unfortunately,
this brings us back to the issue of PID 1 being special. For normal
applications, if re-execing fails, the worst that happens is the
process dies and gets restarted (either manually or by some monitoring
process) if necessary. However for PID 1, if re-execing itself fails,
the whole system goes down (kernel panic).

For common reasons it might fail, the execve syscall returns failure
in the original process image, allowing the program to handle the
error. However, failure of execve is not entirely atomic:

The kernel may fail setting up the VM for the new process image
after the original VM has already been destroyed; the main situation
under which this would happen is resource exhaustion.

Even after the kernel successfully sets up the new VM and transfers
execution to the new process image, it's possible to have failures
prior to the transfer of control to the actual application program.
This could happen in the dynamic linker (resource exhaustion or
other transient failures mapping required libraries or loading
configuration files) or libc startup code. Using musl
libc with static linking or even dynamic
linking with no additional libraries eliminates these failure cases,
but systemd is intended to be used with glibc.

In addition, systemd might fail to restore its serialized state due to
resource allocation failures, or if the old and new versions have
diverged sufficiently that the old state is not usable by the new
version.

So if not systemd, what? Debian's discussion of whether to adopt
systemd or not basically devolved into a false dichotomy between
systemd and upstart. And except among grumpy old luddites, keeping
legacy sysvinit is not an attractive option. So despite all its flaws,
is systemd still the best option?

No.

None of the things systemd "does right" are at all revolutionary.
They've been done many times before. DJB's
daemontools,
runit, and
Supervisor, among others, have solved the
"legacy init is broken" problem over and over again (though each with
some of their own flaws). Their failure to displace legacy sysvinit in
major distributions had nothing to do with whether they solved the
problem, and everything to do with marketing. Said differently,
there's nothing great and revolutionary about systemd. Its popularity
is purely the result of an aggressive, dictatorial marketing strategy
including elements such as:

Engulfing other "essential" system components like udev and making
them difficult or impossible to use without systemd (but see
eudev).

Setting up for API lock-in (having the DBus interfaces provided by
systemd become a necessary API that user-level programs depend on).

Dictating policy rather than being scoped such that the user,
administrator, or systems integrator (distribution) has to provide
glue. This eliminates bikesheds and thereby
fast-tracks adoption at the expense of flexibility and diversity.

So how should init be done right?

The Unix way: with simple self-contained programs that do one thing
and do it well.

First, get everything out of PID 1:

The systemd way: Take advantage of special properties of pid 1 to the
maximum extent possible. This leads to ever-expanding scope creep and
exacerbates all of the problems described above (and probably many
more yet to be discovered).

The right way: Do away with everything special about pid 1 by making
pid 1 do nothing but start the real init script and then just reap
zombies:

Yes, that's really all that belongs in PID 1. Then there's no way it
can fail at runtime, and no need to upgrade it once it's successfully
running.

Next, from the init script, run a process supervision system outside
of PID 1 to manage daemons as immediate child processes (no
backgrounding). As mentioned above are several existing choices here.
It's not clear to me that any of them are sufficiently polished or
robust to satisfy major distributions at this time. But neither is
systemd; its backers are just better at sweeping that under the rug.

What the existing choices do have, though, is better design, mainly
in the way of having clean, well-defined scope rather than
Katamari Damacy.

If none of them are ready for prime time, then the folks eager to
replace legacy init in their favorite distributions need to step up
and either polish one of the existing solutions or write a better
implementation based on the same principles. Either of these options
would be a lot less work than fixing what's wrong with systemd.

Whatever system is chosen, the most important criterion is that it be
transparent to applications. For 30+ years, the choice of init system
used has been completely irrelevant to everybody but system
integrators and administrators. User applications have had no reason
to know or care whether you use sysvinit with runlevels, upstart, my
minimal init with a hard-coded rc script or a more elaborate
process-supervision system, or even /bin/sh. Ironically, this sort
of modularity and interchangibility is what made systemd possible; if
we were starting from the kind of monolithic, API-lock-in-oriented
product systemd aims to be, swapping out the init system for something
new and innovative would not even be an option.

Update: license on code

Added December 21, 2014.

There has been some interest in having a proper free software license
on the trivial init code included above. I originally considered it
too trivial to even care about copyright or need a license on it, but
I don't want this to keep anyone from using or reusing it, so I'm
explicitly licensing it under the following terms (standard MIT
license):

Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
"Software"), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:

The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.