Linux Blog

CAPABILITIES

NAME

DESCRIPTION

For the purpose of performing permission checks,
traditional Unix implementations distinguish two categories of processes:
privileged
processes (whose effective user ID is 0, referred to as superuser or root),
and
unprivileged
processes (whose effective UID is non-zero).
Privileged processes bypass all kernel permission checks,
while unprivileged processes are subject to full permission
checking based on the process's credentials
(usually: effective UID, effective GID, and supplementary group list).

Starting with kernel 2.2, Linux divides the privileges traditionally
associated with superuser into distinct units, known as
capabilities,
which can be independently enabled and disabled.
Capabilities are a per-thread attribute.

Bypass permission checks on operations that normally
require the file system UID of the process to match the UID of
the file (e.g.,
chmod(2),
utime(2)),
excluding those operations covered by the
CAP_DAC_OVERRIDE
and
CAP_DAC_READ_SEARCH;
set extended file attributes (see
chattr(1))
on arbitrary files;
set Access Control Lists (ACLs) on arbitrary files;
ignore directory sticky bit on file deletion;
specify
O_NOATIME
for arbitrary files in
open(2)
and
fcntl(2).

CAP_FSETID

Don't clear set-user-ID and set-group-ID bits when a file is modified;
permit setting of the set-group-ID bit for a file whose GID does not match
the file system or any of the supplementary GIDs of the calling process.

Permit a range of system administration operations including:
quotactl(2),
mount(2),
umount(2),
swapon(2),swapoff(2),sethostname(2),
setdomainname(2),
IPC_SET
and
IPC_RMID
operations on arbitrary System V IPC objects;
perform operations on
trusted
and
security
Extended Attributes (see
attr(5));
call
lookup_dcookie(2);
use
ioprio_set(2)
to assign
IOPRIO_CLASS_RT
and
IOPRIO_CLASS_IDLE
I/O scheduling classes;
perform
keyctl(2)
KEYCTL_CHOWN
and
KEYCTL_SETPERM
operations.
allow forged UID when passing socket credentials;
exceed
/proc/sys/fs/file-max,
the system-wide limit on the number of open files,
in system calls that open files (e.g.,
accept(2),
execve(2),
open(2),
pipe(2);
without this capability these system calls will fail with the error
ENFILE
if this limit is encountered);
employ
CLONE_NEWNS
flag with
clone(2)
and
unshare(2);
perform
KEYCTL_CHOWN
and
KEYCTL_SETPERMkeyctl(2)
operations.

Allow raising process nice value
(nice(2),
setpriority(2))
and changing of the nice value for arbitrary processes;
allow setting of real-time scheduling policies for calling process,
and setting scheduling policies and priorities for arbitrary processes
(sched_setscheduler(2),
sched_setparam(2));
set CPU affinity for arbitrary processes
(sched_setaffinity(2));
set I/O scheduling class and priority for arbitrary processes
(ioprio_set(2));
allow
migrate_pages(2)
to be applied to arbitrary processes and allow processes
to be migrated to arbitrary nodes;
allow
move_pages(2)
to be applied to arbitrary processes;
use the
MPOL_MF_MOVE_ALL
flag with
mbind(2)
and
move_pages(2).

Capability Sets

Each thread has three capability sets containing zero or more
of the above capabilities:

Effective:

the capabilities used by the kernel to
perform permission checks for the thread.

Permitted:

the capabilities that the thread may assume
(i.e., a limiting superset for the effective and inheritable sets).
If a thread drops a capability from its permitted set,
it can never re-acquire that capability (unless it
execve(2)s
a set-user-ID-root program).

A child created via
fork(2)
inherits copies of its parent's capability sets.
See below for a discussion of the treatment of capabilities during
execve(2).

Using
capset(2),
a thread may manipulate its own capability sets, or, if it has the
CAP_SETPCAP
capability, those of a thread in another process.

Capability bounding set

When a program is execed, the permitted and effective capabilities
are ANDed with the current value of the so-called
capability bounding set,
defined in the file
/proc/sys/kernel/cap-bound.
This parameter can be used to place a system-wide limit on the
capabilities granted to all subsequently executed programs.
(Confusingly, this bit mask parameter is expressed as a
signed decimal number in
/proc/sys/kernel/cap-bound.)

Only the
init
process may set bits in the capability bounding set;
other than that, the superuser may only clear bits in this set.

On a standard system the capability bounding set always masks out the
CAP_SETPCAP
capability.
To remove this restriction (dangerous!), modify the definition of
CAP_INIT_EFF_SET
in
include/linux/capability.h
and rebuild the kernel.

The capability bounding set feature was added to Linux starting with
kernel version 2.2.11.

Current and Future Implementation

A full implementation of capabilities requires:

1.

that for all privileged operations,
the kernel check whether the thread has the required
capability in its effective set.

2.

that the kernel provide
system calls allowing a thread's capability sets to
be changed and retrieved.

3.

file system support for attaching capabilities to an executable file,
so that a process gains those capabilities when the file is execed.

As at Linux 2.6.14, only the first two of these requirements are met.

Eventually, it should be possible to associate three
capability sets with an executable file, which,
in conjunction with the capability sets of the thread,
will determine the capabilities of a thread after an
execve(2):

Inheritable (formerly known as allowed):

this set is ANDed with the thread's inheritable set to determine which
inheritable capabilities are permitted to the thread after the
execve(2).

Permitted (formerly known as forced):

the capabilities automatically permitted to the thread,
regardless of the thread's inheritable capabilities.

Effective:

those capabilities in the thread's new permitted set are
also to be set in the new effective set.
(F(effective) would normally be either all zeroes or all ones.)

In the meantime, since the current implementation does not
support file capability sets, during an
execve(2):

1.

All three file capability sets are initially assumed to be cleared.

2.

If a set-user-ID-root program is being execed,
or the real user ID of the process is 0 (root)
then the file inheritable and permitted sets are defined to be all ones
(i.e., all capabilities enabled).

3.

If a set-user-ID-root program is being executed,
then the file effective set is defined to be all ones.

Transformation of Capabilities During exec()

During an
execve(2),
the kernel calculates the new capabilities of
the process using the following algorithm:

In the current implementation, the upshot of this algorithm is that
when a process
execve(2)s
a set-user-ID-root program, or when a process with an effective UID of 0
execve(2)s
a program,
it gains all capabilities in its permitted and effective capability sets,
except those masked out by the capability bounding set (i.e.,
CAP_SETPCAP).
This provides semantics that are the same as those provided by
traditional Unix systems.

Effect of User ID Changes on Capabilities

To preserve the traditional semantics for transitions between
0 and non-zero user IDs,
the kernel makes the following changes to a thread's capability
sets on changes to the thread's real, effective, saved set,
and file system user IDs (using
setuid(2),
setresuid(2),
or similar):

1.

If one or more of the real, effective or saved set user IDs
was previously 0, and as a result of the UID changes all of these IDs
have a non-zero value,
then all capabilities are cleared from the permitted and effective
capability sets.

2.

If the effective user ID is changed from 0 to non-zero,
then all capabilities are cleared from the effective set.

3.

If the effective user ID is changed from non-zero to 0,
then the permitted set is copied to the effective set.

4.

If the file system user ID is changed from 0 to non-zero (see
setfsuid(2))
then the following capabilities are cleared from the effective set:
CAP_CHOWN,
CAP_DAC_OVERRIDE,
CAP_DAC_READ_SEARCH,
CAP_FOWNER,
and
CAP_FSETID.
If the file system UID is changed from non-zero to 0,
then any of these capabilities that are enabled in the permitted set
are enabled in the effective set.

If a thread that has a 0 value for one or more of its user IDs wants
to prevent its permitted capability set being cleared when it resets
all of its user IDs to non-zero values, it can do so using the
prctl()
PR_SET_KEEPCAPS
operation.

CONFORMING TO

No standards govern capabilities, but the Linux capability implementation
is based on the withdrawn POSIX.1e draft standard.

NOTES

The
libcap
package provides a suite of routines for setting and
getting capabilities that is more comfortable and less likely
to change than the interface provided by
capset(2)
and
capget(2).

BUGS

There is as yet no file system support allowing capabilities to be
associated with executable files.