Findings

We briefly state our findings here, with a detailed analysis of how we
arrived at these findings following.

The binary is a remotely controllable program which can be used to
execute arbitrary commands on the compromised system, and provides
capabilities for several types of denial of service (DoS) attacks.
The binary uses a rarely-used IP protocol number (Network Voice Protocol, or
NVP) for control packets, which thus slip by many firewalls and IDS
systems which are oriented toward TCP, UDP, and ICMP attacks. The
binary can generate several DoS attacks including DNS reply floods,
DNS server floods, SYN floods, and IP fragment attacks.

Next, we present a high level timeline that includes the sequence in
which we discovered what the binary did. The tools and methods used
are described, and examples are given where appropriate. The purpose
of this document is to show how things are discovered, not to give
in-depth documentation of all the discovered functionality. For a
thorough documentation of the workings of the binary, see the
technical advisory.

Timeline of the events

May 10 [ 9:30-11:30 am]

The binary was downloaded on an internal machine (Eliza) in the
CoPS lab. Eliza was connected directly (using a crossover network
cable) to another lab machine, Darwin, which was used as a router and
packet sniffer.

A sniffer was started on Darwin to inspect all the packets that
might be sent from and to Eliza. The binary file was run on the Eliza
using strace under normal user login to figure out what system calls
were made. The output log file for strace showed that it checked the
user id and quit. There was no activity on the sniffer. The binary was
again run on Eliza under 'root'. The first try at strace failed since
the process almost immediately forked. This execution was tried again
using the "-ff" flag to strace. Now the strace log file
showed a socket call and then a recv call. The entire strace is shown
below:

The recv was a blocking call, waiting for a packet with protocol
number 11 (0xb in the socket call above).

We ran netstat on Eliza and used NMAP from Darwin to check for
open ports. This didn't yield anything informative.

The next step was to examine the binary in more detail. The
"file" command was used, and we discovered that the
binary is stripped and statically linked. The binary was disassembled
using "objdump -d". The last system call in the strace
log was a recv(), so this served as a starting point for further
analysis of the binary code. System calls are fairly easy to find,
even in a stripped binary, because they always involve an "int
0x80", which can be search for. We looked up the system call
function number (for the al and in this case
bl registers), and fairly quickly found the recv()
function.

We looked through the code and verified what we saw in the
strace: the protocol being used was protocol 11 (NVP). Combining this
information with the strace log, we deduced that the binary was
listening on a raw socket for protocol 11 packets. The binary opens a
socket and waits for remote commands. Since protocol 11 is a
rarely-used (ever-used???) protocol number, this provides a
non-standard way of doing communication with the compromised machine.
In fact, checking our standard RedHat 7.2 installations, we found that
the firewall settings only filtered out unwanted TCP and UDP packets.
Protocol 11 slips right past! This should definitely be fixed in the
firewall -- our recommendation is that the firewall rules have a
default policy of "REJECT", so that any unknown or unexpected packet
formats are not allowed through. Furthermore, it would be easy to set
up an IDS to alert us when a protocol is seen that is not ICMP, UDP,
or TCP.

May 10 [ 2:30-4:30 pm]

The compiler signatures in the .comment section of the binary
show compiler version 2.7.2.l.2. A search for this string on google
shows some hits on binaries compiled for Slackware 3.1, so we download
the libc package for Slackware versions 3.0-3.4. There's a perfect
match for version 3.1, so we conclude the binary was compiled and linked
with the Slackware 3.1 library. (The recv() function code also matches
up with what we identified earlier).

A few perl scripts were written, one for making "signatures"
of functions extracted from the libc package, and one for making signatures
of functions in the challenge binary. These are extracted to files,
one per function, and then a couple of "for" loops in the shell accompanied
with "diff" allow us to match functions from the binary to functions from
libc. This makes a very simple symbol table, and a third perl script is
used to go through and add labels to the objdump output on the challenge
binary. Now the sequence of events in the code is much easier to
follow!

May 12 [ 10:30-12:00 am]

IDA Pro is investigated; notes on the web site indicate that
the preview version won't work for the challenge, so the freeware version
is downloaded (along with the supplied patch). The "Flirt" library
function recognition doesn't work, as there are no linux libc signatures.
There doesn't seem to be a way to create new signatures, and in fact there
doesn't even seem to be a way to import the symbol names that we learned
with the perl scripts. IDA Pro was played with for a little while to see
what its capabilities are.

May 13 [ 9:30 am - 1:30 pm ]

We began looking at code after recv call. We looked
at the labeled objdump output and IDA Pro. In the end we stick with
just using the objdump output in an emacs window. We were able to add comments
directly. Also, emacs macros are used for inserting new labels as functionality
is discovered. IDA Pro did recognize a switch statement and jump
table in the code following recv, which was very helpful.

Between the recv call and the switch statement is a call to a
non-library function. We guessed that this is a data decoding
routine, and the switch statement selects one out of 12 different
commands that can be sent to the binary. The data decoding (and hence
encoding) routines are figured out, and written and tested in C. We
used these routines as the beginning of a control program, where we
could remotely control the compromised machine. We knew that there
were 12 commands from the switch statement, so we made a skeleton that
will support 12 generic commands that we could fill in later with
functions as they were discovered.

As of now our understanding of the first case of the switch statement is the
following: It sets some values from global variables, calls the data
encoding function, followed by getting a random value which is used as
a random packet length (from 400 to 600). The following function
call indicates that it tries to open a RAW socket. We guess that this is
to send the encoded response back through the raw socket.

May 13 [ 2:30-4:30 pm]

"Command 3" looks interesting because of the call to the
system() function, so this case is examined to learn that it allows for
an arbitrary system command (send in the encoded command) to be executed.
In discovering where the IP address to send the response to is kept, it
becomes clear that this is set with "Command 2" and so that code is also
examined. These two commands are completely figured out. Command
2 is used to set communication parameters for return messages.

May 14 [ 2:00-5:00 pm]

The control program is modified to support these two commands,
and it is tested. After a couple of debugging runs, using sniffers on
both eliza and darwin to watch traffic, the remote machine is
successfully controlled. Results of "ps -ef" and
"ls -l /etc" are returned (although the second command
timed out since it took longer than 10 seconds, and since the end of
the command was never reached the controller had to be modified to
time out as well). Interestingly, we found that due to the poor way
in which the data encoding procedure was written, prior results in the
send buffer are pushed back in the buffer and are visible in the
padded packet. In this case, this is the actual plaintext result of
the command, which should not be visible in the packet! Here's a
snort dump showing one of these packets (from a "ps -ef"
command):

It's odd that the author of the binary never noticed this, as it shows
results that they would really not like to be so obvious in the packet
dumps!

Next we began examining the code for "Command 4". It involves a
call to a rather long function, so the code was read and written out
by hand in a C-like pseudo code for easier analysis. UDP packet
assembly code is recognized, with destination addresses taken from a
table of 8000 addresses (presumably DNS servers because the
destination port is 53).

May 15 [ 9:30 am - 2:30 pm ]

With some knowledge of how the command 4 works, and where
parameters are extracted from the command packet, the control program was
modified to create a sample control packet, and this was sent to the compromised
machine to see what would happen.

Many guesses about the command parameters were verified by
examining the sniffed packets. All parameters are modified slightly
to see the effect, and between this knowledge and looking at the code
the entire "command 4" was figured out as being a DNS reply flood
attack. It was noticed that command 9 was almost identical, and
looking at the parameters it was noticed that the only difference was
a rate-controlling parameter.

A basic structure of recording the PID of a currently-running
attack was discovered, and this allowed us to quickly figure out what
commands 1 and 8 did. Command 1 was discovered to give the status of
the running process. Command 8 turned out to be a kill command. These
were quickly added to the control program.

Command 7 was identified by the overall structure (e.g., using
the system() call) as being very similar to command 3, so was quickly
decoded (it executes a system command, like command 3, but has a
timeout of 20 minutes instead of 10 seconds, and the output is not
kept or returned).

Commands 5, 10, 11, and 12 all looked similar in overall
structure to commands 4 and 9 (the active attack commands), and the only
other remaining command was command 6. Looking at the sequence of
function calls, without paying too much attention to the details, led us
to the guess that this starts a backdoor shell listening on a TCP port.
We started the command, and netstat identified the port used as port 23281.
Connecting didn't seem to do anything, so we looked back at the code and
discovered that the first thing entered after connecting to the port had
to be a password, "SeNiF" (the actual string stored in the binary was
"TfOjG", to prevent this from being seen by the "strings"
command). With this knowledge a shell was successfully
launched on the compromised machine, and interacted with using netcat
from a different machine. Here's a cut-and-paste of doing this, with
our typing shown in bold (this
was captured after the complete analysis, so you can see all of the
command descriptions in the control program):

We did a web search for "senif" and "tfojg" (the way the
password was hidden in the binary) to see if there was information on the
web to be learned, but found nothing interesting.

We returned to the remaining commands (numbers 5, 10, 11, and 12)
-- looking at the code only to determine the number of parameters to
send, we tried all these commands while varying the parameters,
watching a sniffer to see what happened. This made it easy to quickly
discover the purpose of all the remaining commands (well, quicker than
reading all the disassembled code anyway!).

May 16 [ 5:00-6:00 pm and 10:00-11:30 pm]

With knowledge of what these commands were supposed to do, the
code was examined for new insights. The only additional thing
discovered was that in command 12 the source address (in addition to
the source port) could be randomized by using an address of all
zeroes.

The control program was modified for commands 4, 9, and 12 to get
attack parameters from the user, rather than using the hard-coded
experimental values we had used in exploring their function.

May 17 [9:45 to 11:45 am]

The objdump code was examined in detail to ensure that the
function parameters matched the guesses.

The controller program was further modified to provide full
functionality for all the commands.

At this point we have completely determined all the functionality of
the unknown binary, and have a control program that can exercise all
functionality of the binary. Focus now turned to writing up the
results as required for the reverse engineering challenge.

Tools used

netstat

strace

netcat

nmap

objdump (disassembler)

emacs

Slackware 3.1 library

perl

Ethereal (sniffer)

Various references found through google and in Linux header files
for protocol numbers, header formats, etc.

Methods summary

We had great success with a general technique of experimental
execution interleaved with examining the disassembled code. We think
this went much faster than would have been possible just looking at
assembly code. Furthermore, experimenting alone cannot discover all
functionality in an unknown binary, since you cannot be sure that your
experiments have exercised all possible flows in the code. By setting
parameters to known values (for example, integers starting at 1) and
seeing what happens, it is possible to make almost certain guesses
about the purpose of the parameters (for example, seeing a SYN flood
to address 1.2.3.4 shows that the address comes from the first 4
parameters). Then with this knowledge, the functions in the
disassembled code can be searched for uses of the parameters to see if
the usage is consistent with our guess. A similar pattern of
experimenting and examining code can be seen with the use of strace
and using the output to zero in on particular function calls
(particular in the beginning).

On the experimental side, the tools we found most valuable were strace
and a sniffer (we usually used ethereal, although snort was used in
some cases). When examining code, nothing we tried worked as well as
simply having the code in an editor (emacs) and using the built-in
searching and keyboard macro capabilities to label things as they were
discovered. We had high hopes initially for IDA Pro, but the free
version we used was more frustrating that it was worth. Perhaps the
full commercial version would be easier to use.

Finally, while we discussed attaching to the running binary using
gdb so that we could actively trace through the code, we
never felt the need to do this. Perhaps in a more complex binary this
would be something that would have a higher payoff.