The Compiler as Attack Vector

Can an attacker build a compromised program from good source code? Yes, if he or she controls the tools. Learn how an attack can happen during the build process.

Media exposure of serious security threats has
sky-rocketed in the last five years, and this has caused
a strange parallel to develop. As software developers
have become more aware of security problems and have
taken steps to mitigate them during the development
phase, attackers have been forced to become more
insidious in exploit vectors. A possible vector that often
is not explored is attacking the program as it
is built.

I first encountered this idea while reading
the September 1995 ACM classic of the month article
“Trusting Trust”, by Ken Thompson. The article
originally appeared in the August 1984 issue of
Communications of the ACM, and it deals with the belief
that ultimate security is impossible to achieve
because in the chain of building an application
there is no way to trust every link fully.
The particular focus was on the C compiler for UNIX
and how, within the build process, the programmer
can be blind to the compiler's actions.

The same problem still exists currently. Because
so many things in the Linux world are downloaded and compiled,
an avenue of attack opens. Binary distributions like RPMs and
Debian packages are becoming increasingly popular; thus, attacking the
build machines for the distributions would yield many unsuspecting
victims.

GCC and Glibc

Before engaging in a discussion of how such attacks could take place, it is
important to become familiar with the target, and how someone
would evaluate it for places to attack. GCC, written and distributed by
the GNU Project, supports many languages and architectures. For the sake
of brevity, we focus on ANSI C and the x86 architecture in this article.

The first task is to become more familiar with GCC—what it
does to code and where. The best way to start this is to build a simple
Hello World program, passing GCC the -v option at compile
time. The output should look something similar to that shown in Listing 1. Examining
it yields several important details, as GCC is not a single program.
It invokes several programs to translate the c source
file into an ELF binary. It also links in
numerous system libraries with virtually no verification that they are
what they appear to be.

Further information can be gained by repeating
the same build with the -save-temps options. This saves the
intermediate files created by GCC during the build. In addition to the
binary and source file, you now have filename.i, filename.s
and filename.o. The .i file contains your source after preprocessing,
the .s contains the translated assembly and the .o is the assembled
file before any linking happens. Using the file command on these files
provides some information as to what they are.

The thing to focus on while looking through the temp files is the type
and amount of code added at each step, as well as where the code comes
from. Attackers look for places where they can add
code, often called payloads, without being noticed. Attackers
also must add statements somewhere in the flow of a program to execute the
payload. For attackers, ideally this would be done with the least
amount of effort, changing only one or two files. The phase that covers
both these requirements is called the linking phase.

The linking phase, which generates the final ELF binary, is the
best place for attackers to exploit to ensure that their changes are
not detected. The linking phase also gives attackers a chance to
modify the flow of the program by changing the files that are linked in
by the compiler. Examining the verbose output of the Hello World
build, you can see several files like ld_linux.so.2 linked in. These are
the files an attacker will pay the most attention to because they contain
the standard functions the program needs to work. These collections are
often the easiest in which to add a malicious payload and the code to
call it, often by replacing only a single file.

Let's take a small aside here and discuss some parts of
ELF binaries, how they work and how attackers can use this to their
advantage. Ask many people who write C code where their programs begin
executing and they will say “main”, of course. This is true only
to a point; main is where the code they wrote begins execution, but in
actuality, the code started executing long before main. You can examine this
with tools like nm, readelf and gdb. Executing the command readelf --l
hello shows the entry point for the program. This is where the
program begins executing. You then can look at what this does by setting
a breakpoint for the entry point, and then run the program. You will
find the program actually starts executing at a function called _start,
line 47 of file <glibc-base-directory>/sysdeps/i386/elf/start.S. This
is actually part of glibc.

Attackers can modify the assembly directly, or they can trace
the execution to a point where they are working with C for
easier modifications. In start.S, __libc_start_main is called
with the comments Call the user's main function.
Looking through the glibc source tree brings you to
<glibc-base-directory>/sysdeps/generic/libc-start.c. Examining this
file,
you see that not only does this call the user's main function,
it also is responsible for setting up command-line and environment
options, like argc, argv and evnp, to pass to main. It is also in C,
which makes modifications easier than in assembly. At this point, making
an effective attack is as simple as adding code to execute before main
is called. This is effective for several reasons. First, in order for
the attack to succeed, only one file needs to be changed. Second, because
it is before main(), typical debugging does not discover it. Finally,
because main is about to be called, all the built-ins that C coders
expect already have been set up.

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.