03 September 2013

Malware Analysis: The State of Java Reversing Tools

In the world of incident response and malware analysis, Java has always been a known constant. While many malware analysts are monitoring more complex malware applications in various languages, Java is still the language of love for drive-by attacks on common end-users. It is usually with certainty that any home user infection with malware such as Zeus, Citadel, Carberp, or ZeroAccess originated through a Java vulnerability and exploit. In typical crimeware (banking/financial theft malware) incidents, one group specializes on the backend malware (e.g. Zeus) while outsourcing the infection and entrenchment to a second group that creates exploit software like BlackHole, Neosploit, and Fiesta.

In many incident responses, I've seen analysts gloss over the Java infection vector as just an end-note. Once they see the final-stage malware on the system they write off the Java component as just a downloader without any real analysis. This creates issues for the times when the Java exploit only partially succeeds resulting in malicious Java JAR files on a system but no Trojan or malware.

Why did it fail? Was the system properly patched to prevent a full infection? Was there a permission setting that stopped the downloader in its tracks? These are the questions that typically force an analyst to begin analyzing Java malware.

I've discussed Java quite a bit on this blog in the past. My Java IDX cache file parser was made for the purpose of identifying files downloaded via Java, be them Windows executables or additional Java JAR files. In that same post I analyzed Java from a Fiesta exploit kit that installed a ZeroAccess trojan onto an analyzed system.

Though Java is not my forte, I've had to face it enough to find that there are many weaknesses and gaps in the tools used for analysis. What I found is that most analysts have been using the same, outdated tools in every case. If the tool fails, they just move on and don't finish their analysis. All the while, new applications are being released that are worthy of note. I felt it worthy to do an annual check-up of the state of analysis tools to display what is available and what weaknesses each holds. There have been similar efforts by others in the past, with the most recent I've found being one in 2010 on CoffeeBreaks, by Jerome.

This post was intended to be much larger and in-depth, delving into how each analysis tool manages decompilation and why they fail, but due to time and resources it was cut short.

The Setup

For this comparison I will be using code from a Java RAT that is in active development. Due to this active development, I will not name the RAT nor provide any files for download.

The malware used is obfuscated by a well-known Java obfuscation tool named Zelix KlassMaster (ZKM). ZKM has been discussed widely in the industry for years and I gave a presentation on how to identify and reverse its string encryption at a NoVA Hackers! (NoVAH!) meeting in May of 2012.

Due to this obfuscation we will be matrixing the results into decompilers match with two well known Java deobfuscators: JMD and JDO.

As it seems to be common with all Java analysis tools, many discussed here are no longer in development and have been left abandoned. However, in many cases, they still work for a majority of malicious samples.

Deobfuscators:
Deobfuscators work by detecting known obfuscation methods, such as renaming variables, classes, and functions, as well as basic string encoding. While many of these are methods are specific to known obfuscators, generic deobfuscation can be performed by searching for a routine that runs against encoded strings, then calling that routine externally against the strings.

JMDis one open-source deobfuscator, written in Java, but also available as a .NET 2.0 (64-bit d/l) executable. It runs directly against a JAR file and produces a deubfuscated JAR as a result. It provides the following deobfuscation methods:

Allatori

DashO

Generic string encoding

JShrink

SmokeScreen

Zelix KlassMaster (ZKM)

JDO(Java DeObfuscator) is open-source Java, as well, and is provided as a .NET 2.0 executable. Unlike JMD, it will only operate against a Java Class file. This will require you to manually unzip a JAR file, then run JDO against each individual Class file. It will attempt to automatically detect and deobfuscate data through generic means.

Decompilers:

JD-GUI is probably the most widely used decompiler. It features a well-thought out GUI as well as the ability to parse entire JAR files. However, its current hosting site is unavailable, though the site is mirrored elsewhere. For updates, refer to the Twitter page of Emmanuel Dupuy.

The various forms of JD-GUI
For the purpose of this post, the latest version of JD-GUI was used. However, this version may be lacking the functionality of over versions of "JD". Recently, a greater deal of development has been performed by the JD-GUI developer on JD-Core / JD-IntelliJ.

JAD is a free decompiler, though one that has been discontinued for many years, and as such has many problems with newer iterations of the Java Development Kit (JDK). It's original web domain is gone, and the project is now hosted elsewhere. It's a basic, command-line tool that has been used as the backend to multiple other Java decompilers.

FernFlower was a free decompiler that appeared around 2009 and was unique for being a web-based decompiler. In 2011, an offline JAR file was made available, and the website taken down shortly after. It's currently used as the backend to many commercial decompilers, such as AndroChef and DJ Java Decompiler. Notably, it's currently available bundled in with the Minecraft Coder Pack.

Procyon is a recently released, open-source decompiler. It is currently in active development and, while a command-line tool, does have two GUI front-ends available: Luyten and SecureTeam's Decompiler. Procyon is available on Bitbucket.

Other decompilers that were not included in the scope of this post:CFRJReverseProKrakatau - Python-based decompiler

Disassemblers:
As reversers know, decompilation is an immature science. Certain liberties are taken to assume and guess what code is doing in order to make readable source. The most accurate method is to simply view the raw data itself as compiled Java bytecode. For those situations, reJ provides an excellent GUI front end, and the ability to modify code on-the-fly.

Eclipse plugins
Some decompilers have the Eclipse IDE plugins available. Eclipse IDE is currently the prominent environment for Java development, and such plugins allow for code to be reversed directly into a new project for debugging and analysis.

Test 1: Simple file writing function.

The first test will be against an obfuscated class function that allows the RAT to save network-transmitted data to the local Windows HOSTS file to override DNS resolutions.

From this analysis, we have little to work off of. We see the java.io.FileWriter class in use, so we know that file activity is taking place, but all strings are replaced with array lookups of z[#]. Let's attempt this again after running the class file through an obfuscator.

Well, that was awkward. JMD did attempt to rename the string array from 'z' to 'var_3a2', but its edits exposed ZKM's string decryption function. This function was unable to be decompiled by JD-GUI and appears as disassembled code. Oddly, this function was not seen by JD-GUI on the raw class file. But, nothing usable here. Similar results were found when using JDO with other decompilers, so further use in this post was stopped.

JD-GUI (JMD Deobfuscated)

importjava.io.FileWriter;publicclassecextendsu{privatestaticfinalString[]z;publicvoidb(Stringarg0){Stringstr=s.b();try{if(b.a()!=b.f)breaklabel111;}catch(ExceptionlocalException2){throwlocalException2;}try{FileWriterlocalFileWriter=newFileWriter(System.getenv("SystemDrive")+"\\Windows\\System32\\drivers\\etc\\hosts");localFileWriter.write(str);localFileWriter.close();s.b("HOSTANSW");s.b("");}catch(ExceptionlocalException1){}try{s.b("HOSTANSW");s.b("ERR: "+localException1.getMessage());return;label111:s.b("HOSTANSW");}catch(ExceptionlocalException3){throwlocalException3;}s.b("Needs to be windows");}}

Well, our work here is done! Based on this display we see the Java code resolving the environment variable of SystemDrive (typically C:\Windows) and adding the hardcoded path to the HOSTS file. It writes a string that's returned from class 's' function 'b' (s.b()), a function responsible for network communications. The "HOSTANSW" strings are simply transmitted back to the C2, along with the "ERR: " message, if encountered.

In all, JD-GUI combined with JMD was able to give us a "full" analysis of this one class file. Let's try other decompilers.

Interesting results there. Note that Procyon threw an exception at the end for an "Inconsistent stack size". Regardless, the code decompiled fine. It also recognized the ZKM string decryption routine but only provided the disassembled code for it. The decompiled code is almost identical to that provided by JD-GUI but is in a much more structured display. While JD-GUI attempts to group conditions together and compact the function borders ({}), Procyon gives a more formal output, albeit a larger one. Even its disassembled output is more structured, with liberal carriage returns.

Let's now run Procyon with a deobfuscated class file:

Procyon (JMD Deobfuscated)

importjava.io.*;publicclassecextendsu{privatestaticfinalString[]z;publicvoidb(finalStringarg0){finalStringb=s.b();Label_0111:{try{if(b.a()!=b.f){breakLabel_0111;}}catch(Exceptionex){throwex;}try{finalFileWriterfileWriter=newFileWriter(System.getenv("SystemDrive")+"\\Windows\\System32\\drivers\\etc\\hosts");fileWriter.write(b);fileWriter.close();s.b("HOSTANSW");s.b("");return;}catch(ExceptionfileWriter){s.b("HOSTANSW");finalFileWriterfileWriter;s.b("ERR: "+((Throwable)fileWriter).getMessage());return;}try{s.b("HOSTANSW");}catch(Exceptionex2){throwex2;}}s.b("Needs to be windows");}}

Similar to JD-GUI, we're able to get a clean decompiled analysis of the file. The two code produced between the two is nearly identical with the main difference being in the formal structure of the conditions.

JAD (raw class file)

JAD is commonly the backup to JD-GUI, but is a much outdated model for decompilation and disassembly. One of my favorite features about JAD, though, is that when it does fail to decompile, it's disassembly is a good mixture of the two. It disassembles, but attempts to put logic into the disassembly instead of just a blind dump like JD-GUI and Procyon:

JAD's decompiler does a fairly decent job, but differs on how it handles exception handling within the code. Let's see how it operates on deobfucated classes:

JAD (JMD Deobfuscated)

importjava.io.FileWriter;publicclassecextendsu{publicec(){}publicvoidb(Stringarg0){Strings1;s1=s.b();try{label0:{if(b.a()!=b.f)breakMISSING_BLOCK_LABEL_111;breaklabel0;}}catch(Exception_ex){}FileWriterfilewriter=newFileWriter((newStringBuilder(String.valueOf(System.getenv("SystemDrive")))).append("\\Windows\\System32\\drivers\\etc\\hosts").toString());filewriter.write(s1);filewriter.close();s.b("HOSTANSW");s.b("");breakMISSING_BLOCK_LABEL_129;Exceptionexception;exception;s.b("HOSTANSW");s.b((newStringBuilder("ERR: ")).append(exception.getMessage()).toString());breakMISSING_BLOCK_LABEL_129;s.b("HOSTANSW");breakMISSING_BLOCK_LABEL_122;throw;s.b("Needs to be windows");}privatestaticfinalStringz[];}

Here we see similar results as to what other tools found. But, as mentioned earlier, the exception handling is very confusing. There are breaks and exceptions inline with functional code. Later conditional sections, such as ensuring that the system is running on Microsoft Windows, are ignored and the code is shown as one series of instructions. All-in-all, it does give us some of the source code in a somewhat reasonable facsimile of the original. Excellent as a back-up tool if others fail, I wouldn't rely upon it for my analysis.

What about FernFlower?

FernFlower was a well known and trusted decompiler years ago. Like most decompilers, it fell off the scene silently. The first version was web-based, requiring you to upload your class files for analysis. Later versions were compiled. The FernFlower engine is currently used as the backend for commercial (shareware) products of DJ Decompiler and AndroChef. While competent tools that have built upon the capabilities of FernFlower, they are generally just commercial GUIs for the tool.

Additionally, FernFlower alone failed horribly in all of the tests here. Astonishingly, when confronted with the raw class file, it was unable to decompile or disassemble the main HOSTS writing function. However, it did decompile ZKM's string decryption routine, the exact opposite of what we need:

Test 2

The second test is against a very basic class file that performs one overall function, to delete a passed filename. One notable feature about this class file is that it contains no string table. That is one less layer to work around, but it still gave some issues.

The first run of JD-GUI against this file produced identical results regardless of if a deobfuscator was used. That leads to the assumption that core obfuscation used by this malware is to simply rename functions and encode strings. However, it failed to decompile the code in any way, providing just basic disassembled code.

Procyon provides ideal output. It shows the various exception catching taking place to ensure that the file exists, and is a file (not a folder or device), before calling for the deletion of it. The only issue is that the file object is copied into three other objects (file2, file3, file4) for exception catching purposes. Realistically, these would likely all be the same object.

Here, JAD slightly disappoints. It was unable to create decompiled code from the point of the first exception catch. Instead, it reverts to a mix of disassembled Java code and decompiled code. However, the class is still simple enough to understand the functionality from this view, though it's nowhere near as useable as Procyon.

What about Krakatau?
Krakatau was mentioned earlier, but not shown here. In my experience, Krakatau provides one of the best decompilation outputs, and is able to reverse a larger array of unusual code. In fact, for the first test, it is able to produce valid code for both the obfuscated routine and the string encoder. It is definitely notable of mention and use. However, I also had many issues with it working correctly. It would crash on most samples I gave it, though it would produce decompiled results. Most of this is due to minor issues: hardcoded checks for a file extension of ".jar", a Java path of JAVA_HOME\jre\lib\rt.jar, instead of the seen "jre7", etc. It may require small adjustments and an analytic eye to work cleanly in its current state, but it is definitely shaping up to become one of the better decompilers.

reJ (raw class file)

I can't just bring up reJ and not discuss it more in depth. reJ is my favorite Java tool for code manipulation, giving you a great deal of power over the code in its compiled form. It is a Java-based disassembler and hex editor for compiled Java class files. It provides granular inspection of the byte codes, string tables, and hardcoded values. It also allows for the direct editing, deletion, and addition of new byte code. It is only a disassembler, though, so its use requires extra knowledge of Java bytecode.

For a better analysis, I'd recommend toggling/enabling the following:

View -> Reference Translation -> Hybrid

View -> Split Mode -> Hex View

View -> Constant Pool

With some practice, you can work some magic with obfuscated Java code with reJ. By inserting print statements you could have the program display all of its decoded/operational values during run time. However, this does require that you manually manage the stack pointers, which is not for the faint of reversing.

Closing Statements

The one takeaway from all of this is that there is, still, no clear-cut best decompiler. Up until this year, it was a losing battle of abandoned products against ever-changing obfuscators and Java implementations. The recent introduction of Procyon, CFP, Krakatau has introduced much-needed new blood into the field. While their results are still not perfect, the hope is that within the next year or two they should surpass JD-GUI and JAD.

For now, though, analysis still requires that a reverser run multiple decompilers against their sample to determine the actual functionality. My current flow has always been to run JD-GUI first, then JAD. However, I've been swayed towards Procyon, accompanied by its new GUIs, to easily churn through hundreds of classes and JAR files, making it my current first-run analysis tool. Krakatau is also included, but it's not yet a tool I would give to a junior analyst.

I'm very excited to see what will come about with these products next year when they've had a chance to mature.

5 comments:

Very cool. Any chance you could post the class files you analyzed here to the Procyon issue tracker so I could look into improving its handling of (de)obfuscated code? (You can create a single issue and attach the relevant classes.)

Oh, and one minor correction: Procyon is written in Java, not Python :).

I discovered (and disclosed. It's not an issue any more) a bug in Procyon which allowed arbitrary bytecode to be concealed. It boiled down to the decompiler not correctly considering wide gotos, such that a wide goto backwards wouldn't be considered in the output at all. Iirc, this issue still effects JD-GUI.