Needle in a haystack

2014-02-03

GaborSzappanos

Sophos, Hungary

Editor: HelenMartin

Abstract

Sometimes what looks like a genuine MP3 encoder library, and even works as a functional encoder, actually hides malicious code deep amongst a pile of clean code. Gabor Szappanos reveals the lengths to which one piece of malware goes to hide its tracks.

Malware authors engaged in Advanced Persistent Threat (APT) operations put great effort into making sure their creations live up to their name and achieve persistence over the course of months or years; in order to do so, the threats must remain undetected by security products.

The authors try both to conceal the presence of the threats on infected systems and to hide their code from analysis and detection. Most crimeware authors achieve the latter by applying sophisticated execryptors and protectors to their code.

Over the past year, however, we have spotted a different approach: malicious code is compiled into an open source library, hidden among a large pile of clean library code, with only a single export pointing to the trojan functionality. The deployment and progression of this malware spans about two years now – however its versioning suggests that its development started longer ago than that.

This malware doesn’t take anything for granted: even common system tools like rundll32.exe and wscript.exe, which are present on all Windows systems, are carried with the installer and dropped when needed.

The malware goes to great lengths to cover its tracks. All of the string constants that could reveal the nature of the backdoor are protected with strong encryption. Additionally, the backdoor itself is disguised as a legitimate MP3 encoder library. In fact, it is a legitimate and functional MP3 library – and a bit more besides.

Exploited carrier workbook

In a handful of cases we have been able to identify the original exploited document that leads to the system infection. At the time of finalizing this paper, three exploited workbooks have been found that install this threat.

All of them are protected Excel workbooks with the default password (for more details see [1]). In short: the workbooks are password protected (that is, checked before opening). It is possible to leave the password field blank – in which case Excel encrypts the content using the default password: ‘VelvetSweatshop’. On the other hand, if a workbook is protected with exactly this password, Excel assumes that there is no password, and opens the document transparently. As a result, the document content is encrypted and hidden from normal analysis, but opening it will execute the shellcode without further prompting.

The workbooks exploit the CVE-2012-0158 vulnerability, which triggers the execution of shellcode within the document.

After the workbooks are opened, the intended operation is to open a decoy workbook – a clean file that grabs the attention of the user while malicious activities proceed in the background. The themes of the decoys give us some idea as to the areas of interest of the target audience of this malware distribution.

Workbook 1

Filename: 300.xls (rough translation: ‘300 petitioners cosigned.xls’)

File size: 839756 bytes

SHA1: 066998e20ad44bc5f1ca075a3fb33f1619dd6313

MD5: 5c370923119f66e64a5f9accdd3d5fb

This does not display any decoy document, just closes the Excel window. Nevertheless, the shellcode execution proceeds.

If the file was opened, it would display a workbook with a list of names, gender, region and phone numbers of Chinese individuals.

Figure 1. Decoy content for 066998e20ad44bc5f1ca075a3fb33f1619dd6313.

Workbook 2

Filename: sample.xls

File size: 638912 bytes

SHA1: e5e183e074d26416d7e6adfb14a80fce6d9b15c2

MD5: 2066462274ed6f6a22d8275bd5b1da2b

Figure 2. Decoy content for e5e183e074d26416d7e6adfb14a80fce6d9b15c2.

Workbook 3

Filename: LIST OF KEY OFFICIALS IN THE DND PROPER.xls

File size: 638912 bytes

SHA1: d80b527df018ff46d5d93c44a2a276c03cd43928

MD5: 80857a5541b5804895724c5d42abd48f

This decoy workbook contains information about key officials in the Philippines Department of National Defense (DND).

Figure 3. Decoy content for d80b527df018ff46d5d93c44a2a276c03cd43928.

In the rest of this article, unless specified otherwise, we refer to the operation resulting from infection via Workbook 1 – but the overall operations (dropped filenames, registry keys, backdoor functions) are the same in each case.

When mining our sample collection for related samples we were able to spot other examples – however, in these instances the initial dropper was not available for our analysis, only the temporary dropper executables or the final payloads could be located. In these cases we don’t have complete information about the system infection, but it is safe to assume that similar exploitation schemes were utilized.

Shellcode

The shellcode features an interesting anti-debugging trick that I have come across quite regularly in APT samples lately. Most of the Windows API functions are resolved and called normally, but some of the critical ones (such as WinExec and CreateFile) are not entered at the entry address (as stored in the kernel32.dll export table), but five bytes after it instead. These functions are responsible for the most critical operations of the code (dropping the payload executable and executing it), which would reveal unusual activity in the scope of an ordinary Excel process.

As most tracers and debuggers would place the breakpoint or hijack function right at the entry of the API function, skipping the first few bytes is a good way to avoid API tracing and debugging.

The same happens with WriteFile and GlobalAlloc, but this time, depending on whether or not there is a call right at the entry of the function, the displacement will be either five or seven bytes.

Figure 5. Anti-tracing hook initialization.

As a result of the functions not being entered at their usual entry points, the first few instructions are missed. As these are still essential for the stack management, the code is compensated within the shellcode, where a standard function prologue (stack frame creation push ebp, move ebp,esp) is executed.

For system functions compiled with standard compilers, the first few instructions are fixed on the entry point, but anything after that can’t be taken for granted. The shellcode can’t enter further than five or seven bytes into the API function, otherwise it could end up in the middle of a multi byte instruction, easily crashing the application.

Figure 6. Anti-tracing used in practice.

In order to extract the embedded executable, the shellcode needs to find the carrier workbook. It does this using the fact that, at the time of the exploitation, the workbook must remain open in Excel. The code enumerates all possible handles and tries to call GetFileSize on each of them. If the function fails, because the handle does not belong to an open file (it could belong to many other objects such as directory, thread, event or registry key), or the file size is smaller than the expected size of the workbook (minus the appended encrypted EXE), 1de10h bytes, it skips to the next handle value.

Next, it reads four bytes from offset 0x1de00; the value found there should be equal to the size of the carrier workbook (this time including the appended EXE).

At this position, in the appended content following the OLE2 document structure, a short header is stored that contains the full carrier workbook size, the embedded EXE size and the embedded decoy workbook size. These values are used by the shellcode. The encrypted EXE content follows.

Organizing the code and structure in this manner makes the carrier/dropper workbook component and the dropped payload executable completely independent – it is possible to replace the payload with a new variant without changing a bit in the carrier encrypted workbook.

Figure 7. Appended header and payload.

Once the hosting workbook is found, the code proceeds with decoding the embedded executable (using a one byte XOR algorithm with running key plus an additional one byte XOR with a fixed key), saving it to a file named ‘Winword.exe’ in the %TEMP% directory, then executing it. At this point, the decoy workbook content is dropped (using the same algorithm: one byte XOR with running key plus one byte XOR with fixed key, only this key differs from the one used in decoding the EXE).

Temporary dropper

This file is the dropper and installer for the final payload. It has an initial anti-debug layer.

The address of the GetVersion function is patched in the import table, to contain an internal function virtual address instead of an imported function address, which is normally expected at that position. The code around the entry point uses the stored value to redirect execution:

The execution actually goes to the address stored at dword_41A188, which is the memory location 00402440.

The program has only one export, LoadLibrary, thus when the operating system loads the program and resolves the external dependencies, this value, stored within the import table region, remains intact. The trick completely fools IDA Pro, which can’t be convinced that the location is an internal position and not an external import. This makes static analysis a bit more complicated. The necessary imported function addresses are later resolved dynamically by the initialization code of the dropper.

The major procedures of the dropper program are not called directly; instead, the trojan builds a function pointer table, and calls to procedures are performed via indexing into this table, as shown in Figure 8.

Figure 8. Building the function pointer table.

The key procedures are identified by having the following instruction sequence near the prologue:

push ebp
mov ebp, esp
push eax
mov eax, 12547908h
pop eax

The value stored in the EAX register is a combination of two elements: 1254 is the marker; 7908 is the numeric ID for the function.

The entry is located by searching backwards for the standard prologue:

push ebp
mov ebp, esp

The procedures are later invoked by calling indexes from the function pointer table (see Figure 9).

%PROGRAM FILES%\Common Files\DBEngin.EXE – a copy of rundll32.exe (a clean Windows system file, used for executing the payload)

%PROGRAM FILES%\Common Files\WUAUCTL.EXE – another rundll32.exe (a clean Windows system file, used for executing the payload).

Additionally, two registry export files named jus*.tmp (with a random number added after jus) are dropped into %TEMP%. These are the old and new hives of the HKLM\SYSTEM\CurrentControlSet\Services\AppMgmt registry location – a location at which the trojan registers itself in order to execute automatically upon each system boot. Saving the hives to a file makes it possible to modify the registry in one shot using RegRestoreKey.

Also dropped is a 301,445 byte long jus*.tmp file, which is a CAB archive containing the payload DLL.

The execution flow takes a different route if the presence of running security products is detected. The following process names are checked: KVMonXP.exe, RavMonD.exe, RsTray.exe, ccsvchst.exe, QQPCTray.exe, zhudongfangyu.exe, 360sd.exe, 360Tray.exe, zatray.exe, bdagent.exe, ksafetray.exe, kxetray.exe and avp.exe. However, not all of the security processes are checked at the same time – only a couple of selected ones are checked before each major operation.

As an example, if zatray.exe, RsTray.exe or RavMonD.exe is running, then AppMgmt.dll is not dropped and instead, the 400MB vbstdcomm.nls is created (the large size is due to an enormous amount of junk appended at the end of the file). Finally, a VBScript file is created and executed with the help of a dropped copy of wscript.exe (both files are saved to the %TEMP% folder, as lgt*.tmp.vbs and lgt*.tmp.exe, respectively). An encrypted copy of Winword.exe is created in %CommonProgramFiles%\ODBC\odbc.txt, using a one byte XOR algorithm with key 0xCC. Vbstdcomm.nls, which serves as a backup installer, takes the encrypted copy of Winword.exe, decodes it and simply executes.

The dropper registers AppMgmt.dll as a service. This is not achieved by creating a new service entry, rather by taking over the role of an already installed service, AppMgmt, redirecting the service DLL from the clean library to the dropped malware payload:

In addition, the start up mode is changed from auto to demand in the location:

HKLM\SYSTEM\CurrentControlSet\Services\AppMgmt: Start

Then it changes the error control settings in the registry key HKLM\SYSTEM\CurrentControlSet\Services\AppMgmt:ErrorControl from normal (this would mean that if the driver fails to load, the start up process proceeds, but a warning is displayed) to ignore (in this case if the driver fails to load, start up proceeds, and no warning is displayed). The change is designed to avoid raising suspicion, should start up fail for any reason.

Finally, it executes the dropped DLL by executing net start AppMgmt.

Payload

We have identified five different versions of the final payload. Two of them were replicated from the exploited workbooks detailed earlier; the other three were found when we were digging through our sample collection searching for samples with similar characteristics.

The main characteristics of the five variants are summarized in Table 1 shown below (detailed descriptions of the columns are provided later in this section).

This DLL is built from the LAME MP3 encoder source [2]. The full library has been compiled, and in addition, a couple of malicious exports have been added to the code: lame_set_out_sample and lame_get_out_sample.

Figure 10. Additional malicious imports.

Note that the names of the additional exports are strikingly similar to the legitimate exports, lame_set_out_samplerate and lame_get_out_samplerate, which are present in the LAME source – thus it is not very obvious that the additional exports belong to something completely different.

One of the extra exports, lame_get_out_sample, is missing from newer versions of the malware. However, the function that would invoke this export is still present in the code. Clearly, the code was not cleaned up properly when the export was removed.

The backdoor contains many encrypted strings, one of which serves as an internal version number. In Table 1 we list the version numbers as they appear in the code. Collected information suggests that the most widely distributed was version 2.3(UDP), making its rounds in the wild in early December 2012.

Table 1 also lists the date when we first saw each particular backdoor variant – either arriving in our collection, reported in cloud look ups or seen elsewhere on the Internet. Additionally, the compilation date is listed, as taken from the PE header.

An interesting quirk comes from the usage of the LAME source: one of the original source functions, beVersion(), inserts the compilation date into the data section of the executable.

Figure 11. Compilation date in code.

This provides an independent method of determining the creation date of the variant aside from the PE time stamp. There was no trick, however – the two dates matched in all cases.

It is notable that there is always a large gap between the compilation date and the date of the first observation of each variant. There are several possible reasons for this:

Small-scale targeted attacks don’t provide much telemetry information; the smaller the number of targets, the slimmer our chances of finding out about their infection.

The trojan looks very similar to a real LAME encoder library; infected victims are reluctant to submit it for analysis.

There may be an intentional delay (some sort of testing period) in the release process of the malware.

The backdoor uses different approaches for handling C&C communication. Earlier versions used the standard Windows socket communication functions (send, recv) to exchange data with the C&C server. The newer versions linked the UDT data transfer library (available from udt.sourceforge.net) for communication. The versioning of the variants suggests that some time around March 2012, the code forked into a socket communication branch (TCP) and a UDT powered communication branch (UDP).

The backdoor features all the basic functionality that is expected from a piece of malware of its class. It is able to:

Create screenshots

Get drive type (FAT, FAT32, NTFS, CDFS) and free space

Enumerate files and directories and send the list to the server

Rename files

Create directories

Delete files.

The last character of the ModuleFileName (without extension) is checked on execution: if it is not of one of the expected values – ‘T’, ‘t’ (executed via net.exe), ‘R’, ‘r’, ‘N’, ‘n’ (executed via DBEngin.EXE), ‘2’ (rundll32.exe), ‘L’ or ‘l’ – it builds and injects a simple piece of code to load AppMgmt.dll properly.

For this purpose, it creates a new suspended process (with command line: c:\windows\system32\svchost.exe), calls GetThreadContext on it, and gets EAX from the CONTEXT structure, using the fact that in the case of a suspended process the EAX register always points to the entry point of the process. Then it writes the starter code to this entry point and resumes the thread. The suspended thread is not visible in the process list at that point. This way, the trojan can escape analysis, if not executed in a natural form, and still execute.

Configuration data is stored in a file named DbTrans.db, XOR encrypted with key 0x58.

The string constants (API names, DLL names, process names) are all stored in encrypted form using a strong encryption algorithm. The strings are stored aligned (Unicode strings to 0x90 bytes, ASCII strings to 0x38 bytes boundary), decrypted in eight byte chunks using the DES ECB algorithm, and referenced by IDs that index into this name pool. The encrypted strings contain padding bytes at the end, where zeros are encoded.

The strings are decrypted on the fly before being used and filled with zeros after use. This way there are no visible strings in the memory that would give away more information about the internals of the backdoor.

There are three nearly identical encryption functions (and accompanying encrypted string tables and encryption keys) in all variants: one is for the Unicode strings, one for the ordinary ASCII constants, and a third one for the Windows API function names (also stored as ASCII strings) that are used in the code. We found that only the encryption keys were different for the three cases. The following key seeds remain the same throughout the variants:

For ASCII strings: 82 C5 D3 59 2B 38 00 00

For Unicode strings: 5E 97 CC 42 8E CD 00 00

For API function names: 5B 5F CB 8D E5 F5 00 00

In the last version, the two ASCII functions are merged into a single function.

The C&C addresses are hard coded into the backdoor, and protected with a simple byte wise XOR (key:0x58) encryption. This is an interesting choice, given that all other string constants are protected with a string DES algorithm – perhaps the server addresses are changed more frequently (indeed, there is a minimal overlap between the different versions’ C&C addresses) than the authors are comfortable with re-encrypting the strings – but no evidence was found for it in the few samples we have found.

The string constants of the code are referenced by IDs and decrypted on the fly. However, there are strings that are never used in the code. These could belong to an earlier or internal version, and simply have not been cleaned up from the string pool, as illustrated in this example:

Conclusion

When looking into APT attack scenarios, one has to be extra careful. Often we see that clean programs and libraries are dropped onto systems to hide the operation of malicious applications [3]. But sometimes, what looks to be a genuine MP3 encoder library, and even works as a functional encoder, actually hides malicious additions buried deep in a large pile of clean code. One has to be very thorough when it comes to targeted attacks, and one cannot afford to make any assumptions.

Latest articles:

The number of incidents attributed to the Lazarus Group, a.k.a. Hidden Cobra, has grown rapidly since its estimated establishment in 2009. In this paper, ESET researchers Peter Kalnai and Michal Poslusny look at various cells within the group, that…

As the world grapples with massive disinformation campaigns waged by the intelligence agencies of hostile nations, we should not forget that such activities are not limited to the purview of the Bears or Pandas of the world, and that even relatively…

In an average five-year-old car, there are about 30 different computers on board. In an average new car, there are double that number, and in some cases up to 100. That’s the size of network an average SMB would have, only there’s no CIO/CISO, and…

Malicious Android applications are quite common, and can even be found from time to time in the Google Play Store. Thus, a lot of work has been done in both industry and academia on Android app analysis, and in particular, static code analysis. One…

We have placed cookies on your device in order to improve the functionality of this site, as outlined in our cookies policy. However, you may delete and block all cookies from this site and your use of the site will be unaffected. By continuing to browse this site, you are agreeing to Virus Bulletin's use of data as outlined in our privacy policy.