Making Root Unprivileged

Mitigate the damage of setuid root exploits on your system by removing root's privilege.

There was a time when, to use a computer, you merely turned it on and were
greeted by a command prompt. Nowadays, most
operating systems offer a security model with multiple users. Typically, the
credentials you present at login determine the amount of privilege that
programs acting upon your behalf will have. Everyday tasks can be
accomplished using unprivileged userids, minimizing the risks due to user
error, accidental execution of malware downloaded from the Internet and so
on. Any
program needing to exercise privilege must be executed using a privileged
userid. In UNIX, that is userid 0, the root user. Unfortunately, this means
any software needing even just a bit of privilege can lead to a
complete system compromise should it misbehave or be attacked
successfully.

POSIX capabilities address this problem in two ways. First, they break the
notion of all-or-nothing privilege into a set of semantically distinct
privileges. This can limit the amount of privilege a task may have,
so that, for example, it only is able to create devices or trace another
users' tasks.

Second, the notion of privilege is separated from userids. Instead,
privilege is wielded by the files (programs) a process executes. After
all, users sitting at the keyboard just invoke programs to run on
their behalf. It is the programs that actually do something. And, it is
the programs that an administrator may entrust with privilege, based
on knowledge of what the program does, who wrote it and who installed it.

POSIX capabilities have been implemented in Linux for years, but until
recently, Linux supported only process capabilities. Because files are
supposed to wield privilege, the lack of file capabilities meant that
Linux required workarounds to allow administrators and system services
to run with privilege. The POSIX capability rules were perverted to
emulate a privileged root user.

Although file capabilities are now supported, the privileged root user
remains the norm. In this article, I demonstrate a lazy prototype
of a system with an unprivileged root.

POSIX Capabilities Overview

Each process has three capability sets:

The effective set (pE) contains the capabilities it can use right
now.

The permitted set (pP) contains those that it can add back into its
effective set.

The inheritable set (pI) is used to determine its new sets when it executes
a new file.

Files also have effective (fE), permitted (pP) and inheritable (fI) sets
used to calculate the new capability sets of a process executing it.

At any time, a process can use cap_set_proc() to remove capabilities from
any of the three sets.

Capabilities can be added to the effective set only if they are currently
in its permitted set. They never can be added to the permitted set. And,
they can be added to the inheritable set only if they are in the permitted
set or if CAP_SETPCAP is in pE. When a process executes a new file,
its new capability sets are calculated as follows:

The inheritable set remains unchanged.

The new permitted set is filled with both the file permitted set (masked
with a bounding set, but for this article, I assume that always is full) and any capabilities present in both the file and process inheritable sets.

Capabilities in the file permitted set will be available to the new
process—an example use for this is the ping program. Ping
needs only the capability CAP_NET_RAW in order to craft raw network packets. It is
typically setuid-root, so all users can run it. By
placing CAP_NET_RAW in ping's permitted set, all users will receive
CAP_NET_RAW while running ping.

Capabilities in the file inheritable set are available to a
process only if they also are in the process inheritable set—an
example of this would be to allow some users to renice other user's tasks.
Simply arrange (as I explain a bit) for CAP_SYS_NICE to be placed in
their pI on login, as well as in fI for /usr/bin/renice.
Now, ordinary users can run renice without privilege,
and the “special” users can run renice, but no other programs, with
CAP_SYS_NICE.

The effective set is by default empty, or, if the legacy bit is set
(see The File Effective Set, aka the Legacy Bit sidebar), it is set to the new permitted set.

The File Effective Set, aka the Legacy Bit

Linux has an unfortunate discrepancy between the setcap command API
and what it actually does. Although setcap expects the user to define
a file effective set, the kernel simply knows about a “legacy
bit”.
Practically speaking, if the file effective set is empty,
the legacy bit is not set. If all bits in the file permitted and
inheritable sets are in the effective set, the legacy bit is
set. If only a subset of those bits are in the effective set,
setcap will return an error.

The reasoning for the setcap API command is that Linux is loathe
to change userspace APIs. The reason for using the legacy bit
is that we want to encourage applications to begin with an empty
effective set if they are capability-aware. Hence, the file
effective set should be empty unless the application is not
capability-aware. But if the application is not capability-aware,
all capabilities available to it must be in its effective set
from the start.