The first blog post of the series on Upatre showed how to unpack the malware. You can download the unpacked sample from malwr.com if you like to retrace my reversing steps.

This second part focuses on the configuration of Upatre. The first section presents the format of the data structures used to store configuration details, such as

the executable name

the C2 address, port, and root directory

the urls that host the (encrypted) malware downloads

the keys to decrypt and validate the downloaded content

The second section then employs a small IDA script to extract and format the configuration information of ten samples (including the unpacked one). All analysed samples use the same configuration format, but they use it in slightly different ways. For example, all Upatre samples decrypt downloaded content with a key from the configuration file, but the key-scheduling algorithm might be different. I list the differences in the overview, but elaborate on the details in the upcoming blog post on Upatre.

Format of Upatre’s Configuration

Introduction

As mentioned in part one, the unpacked code is position independent. All functions and global variable are accessed by a global offset table. This offset table is located at the start of the .text section. Upatre gets the absolute address of the offset table by rounding down the EIP — which it obtains by a fake call — to the page boundary. It then adds the six offsets (stored as WORDs) to the base address to obtain a table of absolute addresses:

The following table shows the six entries of the offset table. The address column shows the resulting absolute address given the .text section starts at 0x401000:

row

offset

address

type

name

0

1380h

0x402380

data

config

2

Ch

0x40100C

subroutine

fetch_and_advance

4

14h

0x401014

subroutine

decrypt_strings_and_get_os_infos

6

138h

0x401138

subroutine

get_decrypted_string_by_index

8

13Fh

0x40113F

subroutine

get_field_of_target

10

14Dh

0x40114D

subroutine

send_user_infos

The first row points to the configuration data. The routines fetch_and_advance, get_decrypted_strings and get_field_of_target are stubs to access the data structures. Half of the decrypt_strings_and_get_os_infos routine, namely the decryption part, is discussed in this post, the remaining half, as well as the routine send_user_infos, follow in the third part.

Upatre uses one large structure as its configuration — referenced by the first entry of the global offset table. The config information is read inside decrypt_strings_and_get_os_infos which reveals most of the format. The following illustration shows an overview of the config data structure with example values from the examined sample:

Port

The first two bytes of the config denote the lowest port address used in C2 communications. Each time the port referenced a random number of up to 3 is added, which results in a port range of four ports:

(Encrypted) Strings

Starting at offset 0x2 the config contains a list of strings. Each string is zero-terminated. For example:

36 60 36 60 00 36 60 4F 36 60 00 7C 63 76 7D 00

Some samples store the strings in plaintext; my sample used the XOR key 0x13 to obfuscate the strings. The strings are first decrypted — if necessary — and then stored as Unicode in a newly allocated sections. The above list of encrypted strings, for instance, becomes:

The remaining strings are dull (like the format string %s%s), or referenced by the targets which are discussed later.

Decryption Keys

Upatre is mainly a downloader for other malware. The downloaded content, as will be shown in the fourth part of this blog series, is encrypted with a four byte key. The keys are stored in a variable sized array, with the number of elements indicated by one byte at the start of the array. For example, my samples stores one key:

.rdata:004025D9 db 1
.rdata:004025DA dd 3CB1A338h

Check Keys

Upatre also validates the decrypted downloads by comparing four bytes of the download to a second key. These keys are again stored in a variable sized array following the decryption keys, for example:

.rdata:004025DE db 1
.rdata:004025DF dd 6B5519C2h

Edit June 17, 2015: Some Upatre samples have no check keys. They use a simplified payload format, see the fourth part of this blog series.

Network Targets

After the two key arrays follows another array which contains information about the network targets. Each network target — except for the C2 server address — is represented by a 7 byte data structure (Edit: older samples use 6 bytes per target). Again a one byte value at the start denotes the size of the array, for example:

The subroutine has two arguments: the target number passed in register al, and the desired field in ah. The subroutine returns four consecutive fields in register eax. Of this return value sometimes al is used, in this case argument ah corresponds to the retrieved field index. In other cases ah is accessed, which means for argument ah the field ah + 1 is accessed. By looking at all calls to the get_field_of_target-subroutine, one can quickly find out the purpose of most fields.

1st Field - C2 root directory

The first field of a target is needed when the target is used as C2 callback, i.e., as a receiver for user information. In those cases the first value of the first field serves as the index into the array of decrypted strings:

6th Field - Another Exe Filename

The sixth fields points to a decrypted exe filename. This filename is not used in the my samples — maybe the filename is used by the downloaded payload (which I didn’t analyse).

Purpose of Targets

The first entry in the targets list has a special meaning for all except one sample: it is used to determine the IP address of the victim. All samples with client IP detection used the icanhazip.com website. The remaining targets are used to host the second stage malware.

Observed Config Files

Edit:Added three more samples (June 17th, 2015). This section shows the config files of 10 different samples. The config files were read with the following IDA Pro script:

All samples were obtained from malwr.com and are shared by the uploader. I use the first eight letters of the hex representation of the MD5 sum as identifier. For each samples I also list the following:

whether or not the strings are encrypted

whether the first target is used to determine the client’s IP

which key-scheduling instruction is used to decrypt downloads. The decryption algorithm is shown in the fourth and last part of this series.

The values after id = denote the root directory used in C2 callbacks. The value after file = is the executable name presumably used by second stage payloads.