Systrace Policies

One of the more exciting new features in NetBSD and OpenBSD is
systrace(1), a system call access manager. With
systrace, a system administrator can say which system calls
can be made by which programs and how those calls can be made. Proper use
of systrace can greatly reduce the risks inherent in running
poorly-written or exploitable programs. systrace policies can
confine users in a manner completely independent of Unix permissions. You
can even define the errors that the system calls return when access is
denied, to allow programs to fail in a more proper manner. Proper use of
systrace requires a practical understanding of system calls,
what programs must have to work properly, and how these things interact
with security.

First off, what are system calls? Sysadmins fling that term around a
lot, but many of them don't know exactly what it means. A system call is
a function that lets you talk to the operating system kernel. If you want
to allocate memory, open a TCP/IP port, or perform input/output on the
disk, that's a system call. System calls are documented in section 2 of
the online manual.

Unix also supports a wide variety of C library calls. These are often
confused with system calls but are actually just standardized routines for
things that could be written within a program. You could easily write a
function to compute square roots within a program, for example, but you
could not write a function to allocate memory without using a system call.
If you're in doubt whether a particular function is a system call or a C
library function, check the online manual.

You may find an occasional system call that is not documented in the
online manual, such as break(). You'll need to dig into
other resources to identify these calls. (break() in
particular is a very old system call used within libc, but
not by programmers, so it seems to have escaped being documented in the
man pages.)

systrace denies all actions that are not explicitly
permitted and logs the rejection to syslog. If a program
running under systrace has a problem, you can find out what
system call the program wants and decide if you want to add it to your
policy, reconfigure the program, or live with the error.

systrace has several important pieces: policies, the
policy generation tools, the runtime access management tool, and the
sysadmin real-time interface. This article gives a brief overview of
policies. Next time, we'll learn about the systrace tools.

Reading systrace Policies

The systrace(1) manual page includes a full description
of the syntax used for policy descriptions, but I generally find it easier
to look at some examples of a working policy and then go over the syntax
in detail. Since named has been a subject of recent security
discussions, let's look at the policy that OpenBSD 3.2 provides for
named.

Before reviewing the named policy, let's review some
commonly-known facts about the name server daemon's system access
requirements. Zone transfers occur on port 53/TCP, while basic lookup
services are provided on port 53/UDP. OpenBSD chroots named into
/var/named by default and logs everything to
/var/log/messages. We might expect system calls to allow
this access.

Each systrace policy file is in a file named after the
full path of the program, replacing slashes with underscores. The policy
file usr_sbin_named contains quite a few entries that allow
access beyond this, however. The file starts with:

# Policy for named that uses named user and chroots to /var/named
# This policy works for the default configuration of named.
Policy: /usr/sbin/named, Emulation: native

The "Policy" statement gives the full path to the program
this policy is for. You can't fool systrace(1) by giving the
same name to a program elsewhere on the system. The "Emulation" entry
shows which ABI this policy is for. Remember, BSD systems expose ABIs for
a variety of operating systems. systrace can theoretically
manage system call access for any ABI, although only native and Linux
binaries are supported at the moment.

The remaining lines define a variety of system calls that the program
may or may not use. The sample policy for named includes 73 lines of
system call rules. The most basic look like this.

native-accept: permit

When /usr/sbin/named tries to use the
accept() system call, under the native ABI, it is allowed.
What is accept()? Run man 2 accept and you'll
see that this accepts connections on a socket. A nameserver will
obviously have to accept connections on a network socket!

Other rules are far more restrictive. Here's a rule for
bind(), the system call that lets a program request a TCP/IP
port to attach to.

native-bind: sockaddr match "inet-*:53" then permit

sockaddr is the name of an argument taken by the
accept() system call. The fnmatch keyword tells
systrace to compare the given variable with the string
inet-*:53, according to the standard shell pattern-matching
(globbing) rules. So, if the variable sockaddr matches the
string inet-*:53, the connection is accepted. This program
can bind to port 53, over both TCP and UDP protocols. If an attacker had
an exploit to make named(8) attach a command prompt on a
high-numbered port, this systrace policy would prevent that
exploit from working -- without changing a single line of
named(8) code!

At first glance, this seem wrong. The eq keyword
compares one string to another and requires an exact match. If the
program tries to go to the root directory, or to the directory
/namedb, systrace will allow it. Why would you
possibly want to allow named to access to the root directory, however?
The next entry explains why.

native-chroot: filename eq "/var/named" then permit

We can use the native chroot() system call to change our
root directory to /var/named, but to no other directory. At
this point, the /namedb directory is actually
/var/named/namedb, which is a sensible location for a
chrooted named(8) to access. We also know that
named(8) logs to /var/log/messages, however.
How does that work, if the program is chrooted to /var/named?

native-connect: sockaddr eq "/dev/log" then permit

This program can use the native connect(2) system call to
talk to /dev/log and only /dev/log. That device
hands the connections off elsewhere. If you didn't know that this was how
the program logged, however, you'd be confused. Although the program is
running in a changed root, /dev/log is opened before the
chroot happens and chroot(2) does not revoke
access to open files outside the chrooted area.

systrace aliases certain system calls with very similar
functions into groups. You can disable this functionality with a
command-line switch and only use the exact system calls you specify, but
in most cases these aliases are quite useful and shrink your policies
considerably. The two aliases are fsread and
fswrite. fsread is an alias for
stat(), lstat(), readlink(), and
access(), under the native and Linux ABIs.
fswrite is an alias for unlink(),
mkdir(), and rmdir(), in both the native and
Linux ABIs. As open() can be used to either read or write a
file, it is aliased by both fsread and fswrite
depending on how it is called. So named(8) can read certain
/etc files, it can list the contents of the root directory,
and it can access the groups file.

systrace supports two optional keywords at the end of a
policy statement, errorcode and log.

The errorcode is the error that is returned when the program attempts
to access this system call. Programs will behave differently depending on
the error that they receive; named will react differently to a "permission
denied" error than it will to an "out of memory" error. You can get a
complete list of error codes from errno(2). Use the error
name, not the error number. For example, here we return an error for
non-existent files.

filename sub "<non-existent filename>" then deny[enoent]

If you put the word log at the end of your rule,
successful system calls will be logged. For example, if we wanted to log
each time named(8) attached to port 53, we could edit the
policy statement for the bind() call to read:

native-bind: sockaddr match "inet-*:53" then permit log

You can also choose to filter rules based on user ID and group ID, as
the example here demonstrates.

native-setgid: gid eq "70" then permit

This very brief overview covers the vast majority of the rules you
will see. As in so many things in computing, systrace does
90% of its work with 10% of its features. For full details on the
systrace grammar, read systrace(1). Now that
you can recognize a systrace policy when you see one, next
time we'll look at some of the tools you can use to create your own
policies.