Digging in the Spam Folder

Unlike spam that appears in my real-world mailbox, the numerous unwanted parcels that arrive continuously in my Gmail spam folder are a gold mine. Not because I'm being offered $1.5 million USD to help with a foreign currency deposit, but because so many malware samples find their way to me. Gmail is pretty quick to identify malicious code, so there is only a small window available to me to download the malicious attachment before Gmail blocks it.

Play time for me begins when I see a spam email similar to the following:

How do I know this is spam? Let me count the ways: (1) The sender and company (both blurred out) are unfamiliar to me. (2) I had no conversation with the sender. (3) I'm not expecting an invoice. (4) It's in my spam folder (ok, just kidding on this one, I would be suspicious wherever this email showed up).

While those may not be enough reasons to prevent someone else from double-clicking the Zip file attachment (who sends invoices that way?), looking inside the Zip file gives one more reason to be cautious:

Hold on a second. Isn't Javascript the preferred way to send an invoice? I guess the author did not feel like sending a macro-filled Office document or executable.

Safely extracting the Javascript and opening it with Notepad++ shows me this:

Oh, how simple, just eight lines of code. This won't take hours to figure out.

However, the first line where the obfuscated string variable ac00p6NI5Jx is defined, goes on for miles. Ok, not for miles, but I did estimate that my computer monitor would have to be 244 feet wide to see the entire string.

Looking at the for() loop, we see that every other symbol from the original string is pushed onto the new array x (reducing the length of the string to a manageable 122 feet, if you care about things like that). So we will end up with this after completing the for() loop:

You will notice that the string is reversed (a common obfuscation trick) and filled with bizarre variable names and hexadecimal numbers. So let's keep digging.

Hold on a Second

"James, are you doing this analysis manually? Isn't it easier to just toss the script into a sandbox and see what it does?"

Yes and maybe yes. Yes I am doing this analysis manually. For fun. But also for learning. I learn something new every time I decipher obfuscated malware. If I was defending a network, I would use a sandbox to get some quick insight into what this code does. If it is a downloader, I'd want to know the website or websites it uses to download the really nasty stuff so I can quickly block those sites in my firewall or URL filter.

Even though I'm just playing here, I still want to jump into the code and find out the website(s) as soon as I can, in order to be able to download whatever malicious code is being served up before the Internet Angels swoop down and block the sites or take down the code.

Back to the Grindstone

It might just be me, but you may agree that this looks like nonsense:

Yup, nonsense. Good old, well organized, structured, functional nonsense. The malware author went beyond obfuscating variable names and using math expressions instead of just simple integers. Look at the way the functions NEh() and NTe() and others are used. They are defined and called from within a statement.

When the gurus at ID Software released the source code for Wolfenstein years ago, I spent hours (ok, months actually) examining it. I saw them do things with C statements I'd never seen before, glorious mixtures of C and inline assembly language, magic with pointers, and a way of coding that made no sense upon initial inspection. I experience the same sense of wonder here and now, looking at how the malware author leverages javascript to do crazy things in an effort to disguise what is really going on.

So, what do all those wacky variables look like? Here is a sample:

There are 225 of these silly variables, each containing a small substring of something larger and more sinister (cue evil laugh, bats, and dark, cobwebby spaces). You've got to be impressed by the time the malware author put into obfuscating the code so that there are no strings that can be triggered on or easily extracted to gain insight.

But after replacing the variables with their string equivalents, we have the following:

Ah, now we're getting somewhere. It's always exciting to see "http://" pop up somewhere in the code. With a little editing we can simplify what we are looking at here:

At this point I paused my analysis and entered each of these URLs into my browser to see if any of the three files are still available for download, as it has been an entire day since I received the spam email. I was quite surprised that all three downloads were still working and saved each file (they turned out to be identical copies).

Not so Fast, Mister !

Unfortunately, it is a little early to pat myself on the back and high-five the empty air in my office. When I look inside one of the downloaded files, I see this:

Not very promising. The entire file looks like that too. Either the downloaded file just contains nonsense (kind of a waste of time if you ask me) or it is encrypted, packed, encoded, or some combination of all three. Or even something else.

The only way to figure out what to do with the downloaded file is to keep analyzing the obfuscated downloader.

Forging Ahead

When I look through obfuscated code, I try to see past the variable names and focus on whatever functionality I can locate. What signs of functionality am I looking for? I look for loops that might be used to process elements of a string. Sometimes these loops are used to reverse a string or extract groups of string symbols for customized decompression or decoding. Sometimes you get lucky and spot the XOR operator inside a loop, which reveals the decryption code. Similar things are done in many downloaders, in different ways.

What I found interesting about the sample being analyzed here is that all of these tricks have been used, plus a few more, such as checksum calculation and verification, lookup table symbol substitution, and even a quick check to verify the 4D, 5A bytes that should exist in a Windows EXE header.

Let's look at that last check first:

The TRp5 variable passed to this function is an array that represents the contents of the decrypted and reversed file downloaded from one of the three websites mentioned earlier. I know this from my analysis efforts (which I completed before starting this write-up), but if I had seen this function before knowing other details, the 4D and 5A values at offsets [0] and [1] would have clued me in to the nature of the TRp5 variable.

The really interesting stuff appears in this next function:

To help with readability, I'm renaming the obfuscated variables and providing actual variable values in a few places, as you can see in this next version:

The first thing this function does is extract the stored checksum value from the encrypted MalCode file (note: the checksum is not encrypted). The checksum is a 32-bit integer saved in the last four bytes of the MalCode file. Here's what the end of the MalCode file looks like:

So, ChkSum will equal 013CF1A2 (remember we are dealing with little-endian numbers).

These four bytes are stripped off the end of the file by the splice operation. The NewChkSum variable is seeded with the value 24 and then every byte of the file is added to NewChkSum with only the least significant 32 bits remaining in the running sum. An empty array is returned if the calculated checksum does not match the stored checksum.

The initial XOR pattern is 00011100 (28 decimal). Before XOR decryption begins, the contents of the MalCode file are reversed. So, if you take another look at the last figure, you will see that the first byte in the reversed MalCode file will be 51, the second byte will be 7F, and so on.

The XOR decryption variable is updated after processing each new byte from the MalCode file by adding 9 to it and keeping the least significant 8 bits of the sum. Let's just make two passes through the XOR decryption for() loop to see what we get:

And so on. But already we see the 4D and 5A byte values that make up the beginning of a Windows EXE header, so things are looking pretty. In fact, here is some additional proof that I'm not just lying through my teeth: The first 256 bytes of the decrypted MalCode file.

Never has that "DOS mode" message looked so good.

And the Winner is?

So what exactly do we have our hands on now that we've decrypted the MalCode file? Taking the SHA1 hash gives us 9d204f98fee15a380aa3f83ef506452a8d530f52, which I pass along to VirusTotal. Johnny, tell James what he's won:

Ahh… lather, rinse, and repeat. Another malware sample to examine. That's the beauty of the spam folder… it's a gift that keeps on giving.

Conclusion

I'm willing to put time and effort into manually analyzing malware because I learn new things every time I do it. I also write new software tools to help me through different parts of the analysis, while also using free tools someone else took the time to develop. Patience and curiosity pay off big when you've cracked the puzzle and revealed what the obfuscated code really does.