Reverse Engineering Stack Exchange is a question and answer site for researchers and developers who explore the principles of a system through analysis of its structure, function, and operation. It only takes a minute to sign up.

I was recently analyzing a web page that contained some highly obfuscated JavaScript - it's clear that the author had went through quite a bit of effort to make it as hard to understand as possible. I've seen several variations on this code - there are enough similarities that it's clear that they have the same source, but different enough that the solution to deobfuscate changes each time.

I started with running the URL through VirusTotal, which scored 0/46 - so it was something of interest and not being detected by Anti-Virus software (at least statically). Next I tried running it through jsunpack to see if it could make any sense of it - no luck, it broke the parser.

Looking at the code, there were a few methods that were designed to be confusing, and then several KB of strings like this that would eventually be decoded as javascript and executed:

22=";4kqkk;255ie;35bnh;4mehn;2lh3b;7i29n;6m2jb;7jhln;562ik..."

After digging around for a few minutes I was able to determine that the bit of code I really carded about was this:

try{document.body--}catch(dgsdg){e(a);}

In this case e had been aliased to eval and a was a string that had been manipulated by the various functions at the beginning of the file (and passed around via a series of misleading assignments).

To quickly get the value of a I modified the code to Base64 encode it and output the value, and then opened the HTML file in Chrome on a VM (disconnected from the network):

document.write(window.btoa(a))

This was able to get me the value I was looking for, but the process took too long - and if I had missed another eval it's possible that I could have executed what was clearly malicious code. So I was able to get what I needed and identify the malware that it was trying to drop - but the process was too slow and risky.

Are there better ways to run javascript like this in a secure sandbox to minimize the risks that go with executing it? I don't see any way a tool could be built to generically deobfuscate this kind of code, so I don't see any way around running it (or building one-off tools, which is also time consuming).

I'd be interesting in hearing about other tools and techniques for dealing with this kind of code.

9 Answers
9

I am the author of JSDetox, thanks to Jurriaan Bremer for mentioning it!

As already said every obfuscation scheme is different. JSDetox does not try to deobfuscate everything automatically - the main purpose is to support manual analysis.

It has two main features: static analysis tries to optimize code that is "bloated up", e.g. statements like

var x = -~-~'bp'[720094129.0.toString(2 << 4) + ""] * 8 + 2;

can be solved to

var x = 34;

as there are no external dependencies.

The second feature is the ability to execute JavaScript code with HTML DOM emulation: one can load an HTML document (optional) and a JavaScript file, execute the code and see what would happen. Of course this does not always work out of the box and manual corrections might be needed.

JSDetox intercepts calls like "eval()" or "document.write()" (what you did by hand) and displays what would be executed, allowing further analysis.
The HTML DOM emulation allows the execution of code that interacts with an HTML document, e.g.:

Svent sorry to bring up an old thread. First of all thanks for an awesome tool. I'm having a problem where btoa() is not being recognized. Could be I'm doing something wrong. I'm trying to do this: data = btoa(junk); document.write(data); thanks
– k0ng0May 30 '14 at 0:53

@k0ng0 The problem is that btoa() is function of the window object in browsers and not a feature of the JavaScript language. JSDetox currently only emulates selected functions of the window object. Thanks for the hint, I will look further into this with the next release.
– sventJun 2 '14 at 21:34

First, thanks for the work on your tool, however, every time I put some obfuscated code in it and hit "analyze" it just says "loading..." and never actually loads anything. Left it for like 10 mins.
– MikeSchemOct 25 '17 at 6:33

This may be an old answer, but to save anyone else total, needless confusion, all of the binaries are of some sort of tool called "InnoBF" even though they're in Malzilla paths on the linked Source Forge. You have to actually build from source, but it's written in Delphi/Kylex/whatever, so yeah, real fun.
– simontemplarJun 6 at 2:48

Your best bet is to use an environment (eg FireFox) in which eval() can be overridden by using a proxy function, and the function just prints the output. That way, there is no risk in missing anything, even if the malware aliases it. Unfortunately, eval() is not designed to be overridden (and I believe is explicitly forbidden by recent ECMAScript spec), but at worst it will fail to run.

For JSDetox you can give the html as input through the webinterface, I think, and it will then spit out some tweaked code (i.e., deobfuscated.) My jsunpck, however, simply accepts a javascript file on the commandline.

As always, success with either of the tools is not guaranteed, but who knows. In the end, you might be more interested in a dynamic approach, such as patching eval as described above.

In addition to the other useful links here, I recommend to try Malware-Jail

Sandbox for semi-automatic Javascript malware analysis, deobfuscation and payload extraction. Written for Node.js

malware-jail is written for Node's 'vm' sandbox. Currently implements WScript (Windows Scripting Host) context env/wscript.js, at least the part frequently used by malware. Internet browser context is partialy implemented env/browser.js.

Checkout the Github repository to know more and see its usage and sample output.

For the last year I've been developing box-js, and I found it to be the most accurate tool when analyzing JScript droppers. Internally, it uses UglifyJS2 to run a static analysis and simplify many obfuscation techniques; the rest is achieved through emulation in a V8 sandbox.