Common input sources

The following sections discuss some of the common inputs and what to do
about them. You should consider each of these inputs when you're
writing your program, and if they are untrusted, carefully filter them.

Environment variables

Environment variables can be incredibly dangerous, especially for setuid/setgid
programs and the programs they call. Three factors make them so
dangerous:

Many
libraries and programs are controlled by environment variables in ways
that are incredibly obscure -- in fact, many are completely
undocumented. The command shell /bin/sh uses environment variables such
as PATH and IFS, the program loader ld.so (/lib/ld-linux.so.2) uses environment variables such as LD_LIBRARY_PATH and LD_PRELOAD, lots of programs use the environment variables TERM, HOME, and SHELL -- and all
of these environment variables have been used to exploit programs.
There are a huge number of these environment variables; many of them
are recondite variables intended for debugging, and it's pointless to
try to list them all. In fact, you can't know them all, since some
aren't even documented.

Environment variables are inherited. If
program A calls B, which calls C, which calls D, then D will receive
the environment variables that A received unless some program changes
things along the way. This means that if A is a secure program, and the
developer of D adds an undocumented environment variable that helps in
debugging, that addition to D might create a vulnerability in A! This
inheritance isn't accidental -- it's what makes environment variables
useful -- but it also makes them a serious security problem.

Environment variables can be completely
controlled by locally running attackers, and attackers can exploit this
in surprising ways. As described by the environ(5) man page (see Resources),
environment variables are internally stored as an array of character
pointers (the array is terminated by a NULL pointer), and each
character pointer points to a NIL-terminated string value of the form NAME=value (where NAME
is the name of the environment variable). Why is this detail important?
It's because attackers can do weird things such as create multiple
values for the same environment variable name (like two different LD_LIBRARY_PATH
values). This can easily lead libraries using the environment variables
to do unexpected things, which may be exploitable. The GNU glibc
library has routines that work to counter this, but other libraries and
any routines that walk the environment variable list can get into
trouble in a hurry.

In some cases, programs have
been modified to make it harder to exploit them using environment
variables. Historically, many attacks exploited the way the command
shell handled the IFS environment variable, but most of today's shells (including GNU bash) have been modified to make IFS harder to exploit.

What's the IFS problem?Although it's not as serious a problem today, the IFS environment variable once caused many security problems in older Unix shells. IFS
was used to determine what separated words in commands sent to the
original Unix Bourne shell, and was passed down like any other
environment variable. Normally the IFS variable would
have the value of a space, a tab, and a newline -- any of those
characters would be treated like a space character. But attackers could
then set IFS to sneaky values, for example, they might add a "/" to IFS. Then, when the shell tried to run /bin/ls,
the old shell would interpret "/" just like a space character --
meaning that the shell would run the "bin" program (wherever it could
find one) with the "ls" option! The attacker would then provide a "bin"
program that the program could find.

Thankfully, most of today's shells counter this by at least automatically resetting the IFS variable when they start -- and that includes GNU bash, the usual shell for GNU/Linux systems. GNU bash also limits the use of IFS so it's only used on the results of expansions. This means that IFS is used less often, and, thus, it's much less dangerous (the original sh split all words using IFS, even commands). Unfortunately, not all shells protect themselves (Practical Unix & Internet Security -- see Resources
for a link -- has sample code to test this). And although this
particular problem has been (for the most part) countered, it
exemplifies the subtle problems that can occur from unchecked
environment variables.

Unfortunately,
while this hardening is a good idea, it's not enough -- you still need
to deal with environment variables carefully. An extremely important
(though complicated) example involves how all programs are run on
Unix-like systems. Unix-like systems (including GNU/Linux) run programs
by first running a system loader (it's /lib/ld-linux.so.2 on most
GNU/Linux systems), which then locates and loads the necessary shared
libraries. The loader is normally controlled by -- you guessed it --
environment variables.

On most Unix-like systems, the loader's
search for libraries normally begins with any directories listed in the
environment variable LD_LIBRARY_PATH. I should note that LD_LIBRARY_PATH works on many Unix-like systems, but not all; HP-UX uses the environment variable SHLIB_PATH, and AIX uses LIBPATH instead. Also, in GNU-based systems (including GNU/Linux), the list of libraries specified in the environment variable LD_PRELOAD is loaded first and overrides everything else.

The
problem is that if an attacker can control the underlying libraries
used by a program, the attacker can completely control the program. For
example, imagine that the attacker could run /usr/bin/passwd (a
privileged program that lets you change your password), but uses the
environment variables to change the libraries used by the program. An
attacker could write their own version of crypt(3), the password
encryption function, and when the privileged program tries to call the
library, the attacker can make the program do anything -- including
allowing permanent, unlimited control over the system. Today's loaders
counter this problem by detecting if the program is setuid/setgid, and
if it is, they ignore environment variables such as LD_PRELOAD and LD_LIBRARY_PATH.

So, are we safe? No. If that malicious LD_PRELOAD or LD_LIBRARY_PATH
value isn't erased by the setuid/setgid program, it will be passed down
to other programs and cause the very problem the loader is trying to
counter. Thus, the loader makes it possible
to write secure programs, but you still have to protect against
malicious environment variables. And that still doesn't deal with the
problem of undocumented environment variables.

For secure
setuid/setgid programs, the only safe thing to do is to always "extract
and erase" environment variables at the beginning of the program:

Extract the environment variables that you actually need (if any).

Erase the entire environment. In C/C++, erasing the environment can be done by including <unistd.h> and then setting the environ variable to NULL (do this very early, in particular before creating any threads).

Set just the environment variables you need to safe values. One environment value you'll almost certainly re-add is PATH, the list of directories to search for programs. Typically PATH should just be set to /bin:/usr/bin or some similar value. Don't include the current directory in PATH,
which can be written as "." or even as a blank entry (so a colon at the
beginning or end would probably be exploitable). Typically you'll also
set IFS (to its default of " \t\n" -- space, tab, and newline) and TZ (timezone). Others you might set are HOME and SHELL.
Your application might need a few more, but limit them -- don't accept
data from a potential attacker unless it's critically needed.