Question 1

Identify and explain the purpose of the binary.

The file "the-binary" is a statically linked (against Libc5.3.12) and stripped ELF binary executable. It is a multi-purpose denial-of-service agent and system backdoor. It allows an attacker to communicate with it via scrambled network packets to either perform automated attacks against specified targets, or to allow access to the host itself to execute commands.

Question 2

Identify and explain the different features of the binary. What are its capabilities?

The binary requires root permissions to run. This is primarily to open raw network sockets. On startup, it tests for euid 0, and if not being executed by a user with root permissions, it quits.

It changes argv[0] to the string "[mingetty]", in order to hide its appearance in the process list.

It then enters a command loop, listening on a raw socket for packets with IP protocol set to 11. If a packet is found, byte 0 of the body following the IP header is checked for value 2, and if the total size of the packet is greater than 200 bytes. If these two conditions are not met, the loop is begun again.

It is important to note that the source address of this packet may be (and probably is) forged. This would make it difficult to identify the source of this type of traffic from only the packet itself.

The contents of the packet body, starting at byte 2 of the payload, are descrambled. Details of the descrambling algorithm are included in answer 3. The value of byte 2 of the descrambled output is checked to be between 1 and 12. If it is outside these boundaries, the loop is begun again.

Depending on the value of byte 2, the following commands are executed
and upon exit the loop begun again. Please note, packet contents from
the payload byte 3 (command byte) are encrypted, and all additional
bytes following the last byte listed are padding, and may be filled with
arbitrary content.

Command 1 - Status

Packet Format:

Returns a scrambled packet of the form:

The packet
returned is a status packet. The entire response is encryped, from
the payload onwards.

Byte 1
(beginning at byte 0) of the payload is set to 1, byte 2 set to 7,
the third byte indicates whether a command is in progress, and the
fourth will tell the current command in progress.

Command 2  New Master

Packet Format:

This sets an IP address to become the recipient
of any master traffic (non denial-of-service traffic). This traffic
may be the output of shell commands executed on the compromised
machine, or status packets from Command 1 above. The IP address is
contained in network byte order in the unscrambled payload, as
indicated by m[0]-m[3] above. If the value of the fourth byte
(decoy flag) is 2, the binary
creates a list of 10 IP addresses, all of which are randomly generated
decoy hosts apart from one randomly selected position to hold the IP
address of the new master. Any outbound master communication from the
binary is sent to all 10 IP addresses to make it difficult to identify
the true master. If the value of this decoy flag is 0, then the binary does
not use decoy addresses.

The binary also takes the destination address of
this packet and uses it as the source address for further outbound
communication. This makes it unnecessary to identify a local IP
address as part of the code.

Command 3  Execute Shell Command and Return Result

Packet Format:

This command executes a null terminated shell
command of maximum size 24 bytes. This is executed using ‘/bin/csh –c
–f’, redirecting the output to the file /tmp/.hj237349’. The output
file is then scrambled and sent back to the master, splitting it
across packets if necessary. As part of the sending process, the
packets are also sent to all 9 other decoy hosts, if decoy hosts are
enabled.

The scrambled reply packets have the entire
payload scrambled, as in command 1.

Command 4  DNS Flood

Packet Format:

Command 3 causes a DNS flood against the hostname
or IP address (S[0]-S[3]) provided. If byte 10 of the payload is set
to 1, a hostname is used rather than the IP address S. S is in network
byte order. S_HI and S_LOW are the high and low bytes of the UDP port
of the source to be flooded with DNS replies.

The DNS flood consists of a set of forged queries
to be made to a list of DNS servers embedded in the binary. Because
the source address of the query is forged to the target, and the query
returns far more data than is sent to the DNS server, the net effect
is traffic amplification and denial-of-service against the target.

A set of 8000 DNS servers are embedded into the
binary, as well as pre-configured queries to request SOA records for
.com, .net, .de, .edu, .org, .usc.edu, .es, .gr, and .ie. The flood
cycles through all these query types, for each of the servers stored
in the binary.

The DNS flood continues indefinitely, unless
killed by a Command 7. The DNS flood also resolves the target
hostname, if provided, every 40,000 iterations. This enables the
flooding to continue should the target host switch IP address to avoid
the traffic.

Command 5  Packet Flood

Packet Format:

This is combination UDP and ICMP flooder. If the
payload byte 4 (type) is specified as 0, then the flood consists of a stream of ICMP
Ping packets directed at the target from the specified source address.
If the type is specified as 1, then the flood sends UDP packets from
random source ports to a specified destination port, found in payload
byte 5. This destination port can have a value between 1 and 255, as
it is contained in a single byte. The Source (S[0]-S[3]) and
Destination (D[0]-D[3]) are in network byte order. If the payload byte
13 (resolve) flag is set, the hostname is used.

The
hostname is resolved in 40,000 iterations.
The source address of the flood packets is also spoofed.

Command 6  Create Shell

Packet Format:

Command 5 creates a password protected shell on
port 23281. The shell can be accessed by telnet, and typing the
password ‘SeNiF’ followed by some carriage returns. The password is
hard coded into the binary, however the comparison is made by
subtracting 1 character from the stored binary. This makes it
impossible to immediately identify the password using ‘strings’ or
something similar.

Command 7  Execute Shell Command and Kill

Packet Format:

This command runs a shell command, similarly to Command 3, however the result is not returned. The command is terminated after 20 minutes.

Command 8  Kill previous Command

Packet Format:

This command terminates a previously running command. This would be
used to terminate the denial-of-service attacks, as they run for an infinite period of time.

Command 9  DNS Flood with Resolver Iterations

Packet Format:

Command 9 is identical to Command 4, however it will resolve the target hostname in 40,000*'ResTime' iterations

Command 10  TCP SYN Flood

Packet Format:

A TCP SYN Flood is launched against the target. The target may be a hostname or an IP address, this is specified similarly to Command 4 by setting the resolve flag. The target hostname, if used, is resolved every 40,000 iterations of the flood. The source address of the SYN Flood may be specified, or the
'synrand' flag can be set to 0 to use a random source address per SYN packet.

D_HI and D_LOW are the high and low bytes of the target port to be
SYN Flooded.

Command 11  TCP SYN Flood with Resolver Iterations

Packet Format:

Command 11 is identical to Command 10, however it will resolve the target hostname in 40,000*'ResTime' iterations.

Command 12  DNS Flood with Specified DNS Server

Packet Format:

Command 12 is identical to Command 4, however it takes a user specified DNS server
(DNS[0]-DNS[3] in network byte order) rather than using the DNS servers embedded in the binary. Command 12 will also resolve the target hostname in 40,000*'ResTime' iterations.

Question 3

The binary uses a network data encoding process. Identify the encoding process and develop a decoder for it.

The encoding process is a simple stream cipher using the addition of adjacent characters plus a fixed value (23) to generate an output character.
The encoding algorithm can be expressed in pseudocode as:

The encoder is used to encode outbound traffic, both from client and
the-binary. The decoder is used to decode inbound traffic from either of
these.
Both encoder and decoder can be found in the additional files, as
encode.c

Question 4

Identify one method of detecting this network traffic using a method that is not just specific to this situation, but other ones as well.

The network traffic could be generally be
identified by logging outbound traffic on egress routers that does not
legitimately originate from internal networks. This would identify a
significant amount of traffic from compromised hosts using fraudulent
source IP addresses. This should be carefully considered however, as
this may cause significant load on border routers.

The command network traffic could be identified
simply by searching for packets greater than 200 bytes with IP protocol
set to 11.

Question 5

Identify and explain any techniques in the binary that protect it from being analyzed or reverse engineered.

The binary was stripped, and as such no symbol
information was available. The binary was also statically linked. This
combination meant traditional methods of debugging were hampered by the
fact that all library functions were not resolvable to a name. The
technique we used to resolve the functions to names was effectively
binary matching against the libc version identified in ‘strings’ output.

In addition to the binary manipulation, the
password for the shell in Command 5 was not stored in plain text, rather
slightly encoded to prevent using ‘strings’ output as input to a brute
forcer.

The binary also used a significant amount of
forking, which may make it difficult to analyse under a debugger. As
debugging was not the method we used to reverse the software, it was not
encountered as a problem.

Question 6

Identify two tools in the past that have demonstrated similar functionality.

Initial indications would suggest this is a
distributed denial-of-service tool. However, from this binary alone it
was not possible to identify any form of distributed agent/master
software required to command the tool. As such, technically it can only
be compared with such multi denial-of-service tools as ‘rape’ or ‘targa’.
That being said, there is no reason why a distributed
client-master-agent relationship could not exist, and it is highly
probable that it does. In this case, the most fair comparison would be
with tools such as TFN or Stacheldraht.

Bonus Questions

What kind of information can be derived about the person who developed this tool? For example, what is their skill level?

The person or persons responsible for this tool seem to have a slightly better than average knowledge of networking protocols and their uses.
They selected an IP protocol that is virtually unused, and have incorporated
packet scrambling techniques into the tool. They used raw sockets to create a stateless connection between the master and the slave. The tool itself has some features that show a more advanced understanding of the types of attacks
and prevention, such as re-resolving the target every 40,000 or so packets to avoid the target
changing their IP address DNS hostname. The code was written well,
probably by someone with good understanding of Unix daemons and network
programming. There were some instances however where commands could have
been reused rather than creating new commands to handle different cases.

What advancements in tools with similar purposes can we expect in the future?

Advancements in the scrambling routines - substitution of scrambling for more advanced cryptographic routines to better hide the commands being transmitted between the components of the tool.

Hiding traffic inside higher level protocols - such as sending commands
and responses inside https requests and responses or DNS requests and
responses, which would appear valid to the casual observer but which are
never intended to hit any particular server, rather just the same
address space as the tool.

Additional methods
of hiding the tool - such as sending the traffic to an address on the
same network rather than directly to the tool, or use of kernel
modules to use the kernels functionality to hide the existence of the
tool completely from any users of the system.

Anti-disassembly techniques could be improved to make discovery of the tool's purpose much more difficult, this could include using custom modified libraries to stop tools such as our disassembler
from resolving the functions, or packing/encrypting the binary.

More advanced attacks directed at specific OSes - Using less flood type attacks and making more use of more complicated, less easily detected DOS attacks that use less traffic to achieve the same result. ie;