Wednesday, April 9, 2014

We are happy to announce that our popular Memory Forensics and Malware Analysis Training course is going to be held in Canberra, Australia in August. This is our first offering in Australia, and we are already extremely excited to have a great training session full of inquisitive and enthusiastic students.

This is the only memory forensics course officially designed, sponsored, and taught by the Volatility developers. One of the main reasons we made Volatility open-source is to encourage and facilitate a deeper understanding of how memory analysis works, where the evidence originates, and how to interpret the data collected by the framework's extensive set of plugins. Now you can learn about these benefits first hand from the researchers and developers of the most powerful, flexible, and innovative memory forensics tool.

When
August 25th through 29th
Class runs from 9AM to 5PM

Where
Canberra, AU

Instructors
Michael Ligh (@iMHLv2), Andrew Case (@attrc), and Jamie Levy (@gleeda)
Information on each instructor can be found here.

Registration Process
To request a link to the online registration site or to receive a detailed course agenda/outline, please send an email voltraining [[ at ]] memoryanalysis.net or contact us through our web form.

Past Reviews
Many past reviews of the course can be found on our website here as well as a previous blog post here. We also have some additional feedback from our course in Europe last week:

Tuesday, April 8, 2014

In late February of this year multiple security companies (FireEye, AlientVault, SecPod, Symantec, plus many more) were reporting on a Flash zero-day vulnerability (CVE-2014-0502) being exploited in the wild. Around this time a friend asked me if I could reverse the exploit and its associated files in order to write a decoder for it. The purpose of the requested decoder was to statically determine the URL from where the backdoor executable (shown later) would be downloaded.

As explained in the referenced links, the exploit found in the wild works by:

Having a victim's browser load a Flash file (cc.swf) that exploited the vulnerable Flash player

The exploit (shellcode) then downloads a GIF file (logo.gif) from the web server hosting the SWF file

This GIF file contains encrypted/encoded shellcode embedded within it that eventually downloads a backdoor executable from an encrypted URL within the file. Static decryption of this URL is what my friend was after

Note: For simplicity, in the steps above, I skipped some of the details on how the shellcode utilizes ROP, defeats ASLR, and finds the necessary libraries and functions (LoadLibrary, InternetOpenURL, CreateFile, etc.). If you are interested in these details please see the FireEye writeup linked above. If you are interested in how the exploit achieves arbitrary code execution then you should read the writeup from Spider Labs here. (You should probably just read the Spider Labs writeup anyway as its very well done...)

Unfortunately, many of the previous research writeups were not available at the time of my friend's request. To assist with Flash decompilation, I used SoThink SWF decompiler, which is a tool I cannot recommend enough, and that I have used to successfully analyze numerous Flash files. Since my effort, Zscaler has published a nice writeup on the Flash file and how it constructs its payload, although it misses a key part to writing the decoder -- how to determine where the encrypted shellcode starts within the downloaded GIF file.

Through analysis with SoThink's tool I was able to determine that the last four bytes of the GIF file contained a little endian integer that represented the offset of the encrypted payload from the beginning of where the offset is stored (the end of the file minus 4). The following decompiled function shows this process. The decompilation is from SoThink and the comments are mine:

As can be seen on lines 4-7, which are inside the constructor of the Flash file, the logo.gif file is downloaded. The URLLoader instance used to download the file has its complete listener set to the function shown on lines 11 through 35. This function is triggered once the file is finished downloading. On line 16 the file's contents are read into an array named _loc_3 (the third declared local variable) by the decompiler. On lines 19 and 20 the array's position is moved to four bytes before the end of the file and its disposition is set to little endian. On line 23 the integer at these last four bytes is read and a new byte array, _loc_2 is declared. Line 27 is the key one as _loc_2 is filled with the bytes of _loc_3 (logo.gif) starting at the end of the file minus the integer read minus another four bytes. At this point _loc_2 holds the encrypted shellcode and is stored in the mpsc shared property. This buffer will later be executed by the exploit payload.

Now that I knew how to find the shellcode reliably, I then needed to decrypt the shellcode in order to find the instructions that located and decrypted the embedded backdoor URL. When analyzing the shellcode I was working in a native Linux environment so I used a mix of vim, dd, and ndisasm. To start, I figured out the offset of the encrypted shellcode within my file and then extracted a copy of the shellcode.

For those of you who have never used ndisasm before, it is the disassembler that comes with the nasm assembler. The b option defines the architecture (16, 32, or 64 bit Intel), and ndisasm simply treats the file as raw instructions. This makes it very useful when analyzing shellcode. In the output the first column is the offset of the instruction from the beginning of the file, the second column is the instruction's opcodes, and the third column is the instruction's mnemonic.

As you can see in the ndisasm output, the instructions make sense until line 14 (offset 0x1f) where the out instruction is used. out is used to talk directly with hardware devices, and as such is a privileged operation. Since this shellcode runs in userland, out cannot be used and even the operating system's kernel mode components use it sparingly. Examination of the instructions on lines 2 through 13 reveal a decryptor loop that targets the instructions starting at line 14. To begin, lines 2-4 leverage the floating point unit to determine the runtime address of where the fldz (offset 0) instruction is memory. The floating point internals that enable this are explained in an Symantec paper here and the shellcode trick was first disclosed by noir in 2003 here.

Lines 5-7 then setup the loop. First 0x1f is added to the esi register which moves it to the offset where the out instruction is. ecx is then set to zero using xor and the value 0x900 is moved into the cx (the bottom half of ecx) register. This is the loop counter, so we know that the first layer of decryption will operate on 0x900 (2304) bytes. Lines 8 through 12 then implement the deobfuscation of the bytes beginning at offset 14 with an algorithm that translates to:

As you can see from the converted assembly, the purpose of this loop is to flip each byte with the one preceding it in the obfuscated shellcode. This is done by using the ah and al registers which are 1 byte in size each. After running the decoder above, the instructions starting at our original out instruction (offset 0x1f) now make sense and become the second stage of shellcode decryption:

As can be seen, this decryptor stage is another loop that transforms the code that follows it. Lines 2-4 contain code necessary to place esi at the fldz instruction of the second stage decryptor. Line 5 then adds 0x21 to esi in order to point it to the junk outsb instruction at line 17. The loop counter is initialized to 0x8f0 at line 9 and then lines 11 through 15 perform the transformation. This transformation can be expressed in C as:

You may notice that this is the same algorithm used in stage 1 for decryption, just with a different loop counter since the decrypting process is moving further down the file. By running the algorithm starting at line 14 (offset 0x1f) we get the fourth level of decryptor shellcode:

Remember that the original purpose of my friend's request was a decoder that could decrypt the encrypted URL used to download the backdoor file. The first relevant instruction that I found related to this task is at offset 0x90 in the deobfuscated function:

On line 1 we see a call instruction being made to the next instruction (pop ebx). This has the effect of placing the runtime address of the pop ebx instruction into ebx. 0x50 is then added to this address and a loop begins that is searching for 0x12311231 (0x31123112 on disk due to little endian) towards the end of the GIF file. This is the special marker used to denote where the encrypted URL begins. Once this marker is found, control is transferred to offset 0xbc (this check and jmp occurs on lines 6-7).

Starting at offset 0xbc, we have the following loop. Note that this disassembly is annoyingly long due to no optimizations being used.

On lines 1-3 the pointer to where the marker was found is incremented by 4 to skip the marker and then placed into [ebp-0x18c]. This value is then also placed into [ebp-0x180] on line 5. The buffer holding the URL is then enumerated until an 0xff marker is found. This is accomplished by the byte comparison of -0x1 on line 10 and the bailout if found on line 11. Lines 12 through 29 perform the decryption of the URL. The main part of this decryption is on lines 15 and 22 (shown in red) where each byte is transformed by XOR'ing with 0x12 and then subtracting 0x31.

After this analysis, we now finally know how to find and decrypt the URL:

Read in logo.gif

Search the file for 0x31123112 in little endian

Once found, decrypt each byte by XOR it with 0x12 and subtract 0x31

Stop processing when a byte of 0xff is found

After implementing the above steps I was able to give my friend a Python script that could read the decrypted URL from abitrary logo.gif files that he found through his analysis and hunting:

In the end, a simple decryptor loop is all that is needed to decrypt the URL, but it took reversing multiple stages of shellcode and runtime code modifications in order to find and understand this algorithm.

I hope that you found reading this blog post informative and interesting. I would like to thank the other members of the Volatility Team (@iMHLv2, @gleeda, @4tphi) for proof reading this before I hit 'Publish' and for @justdionysus providing the historical references related to the use of the FPU for finding EIP. If you have any questions or comments on the post please leave a comment below, ping me on Twitter (@attrc), or shoot me an email (andrew @@@ memoryanalysis.net).