Saturday, December 1, 2007

When I was working on mobile applications, obfuscation was mandatory part of the build. When every byte counts, you can not afford to have long variable names or carry extra stuff if it's not critical to the application functionality. In fact we didn't really care about the actual obfuscation (it's quite difficult to take an app out of the phone anyway, and even then, the success of a mobile game usually does not depend on some top-secret algorithms). Back then it was all about jar size.

The other day, I got the task to obfuscate an application that we wanted to ship to external client. The app was a SWT GUI, making use of reflection, runtime generics and runtime attributes. Also, the idea was to merge all libraries in the app jar and wrap everything in a native launcher. First I tried to merge the JARs. There was a small issue with the order of merging, since one of the libraries needed some file in META-INF, which existed in more than one jars, but overall no major problems (good that we didn't use OSGI).

Next step was the obfuscation. Obfuscating a moble app is pretty straightforward - you define all the library interfaces as seeds and let the obfuscator do the rest... errr, I guess that wasn't very clear, perhaps I should step back and take a look at ProGuard (my weapon of choice when it comes to free obfuscators), but the principles should apply to most of the products on the market.

The ProGuard obfuscation consists of a couple of stages:

Shrinking

Starting from a specified seed classes or methods, analyze the control flow and remove all the reachable code. The different obfuscators have different ways of specifying the seeds, the simplest ones being "keep everything which is not part of my source" (this is actually enough for a mobile application) of "keep everything". You also need to include here any class which is accessed by reflection only (think plugins and DI), native methods, classes accessed exclusively from native code, classes used as default values of annotation attributes unless you always specified a proper value, etc.

The shrinking also removes all the attributes from the classes fields and methods. If this doesn't mean much to you, you are not alone - one usually doesn't think about what's in a class until things start breaking (and break they did).

The first problem was that all stacktraces did not contain line numbers. That was actually easy to fix - just keep the LineNumberTable attribute and replace the SourceFile and SourceDir with fixed string - both are quite easy with ProGuard.

Next problem was that the DI container could not read the generic attributes from the collections and was sticking inside strings instead of URLs. Again - the Signature attribute contains the information used by the runtime generics reflection.

Then I found that none of my runtime annotations were kept. After some time spent staring dumb at the JVM spec (Chapter 4), I learned that the annotations are kept in anoher set of attributes - namely RuntimeVisibleAnnotations and RuntimeVisibleParameterAnnotations. The annotation default values are kept in an AnnotationDefault attribute of the corresponting method in the annotation class (or interface if you prefer) - you can strip these if you specify explicit values for all annotations.

There were also some attributes related to enums, but it looks like they are not used at run time.

Optimization

Not really sure what it does exactly. I have seen it reduce the number of methods, but it has really only two settings - "optimize" (yes/no) and "number-of-passes". I guess that each pass does one level inlining if a method meets certain criteria, but it might also do many other whole-program optimizations.

One thing which might be interesting is a profile-guided optimization like the Intel C++ compiler, where the optimizer would first instrument your classes, adding probes to your bytecode. Then you would run your app a couple of times to generate execution profiles and then optimize your app using them. Of course that's partly what the Hotspot already does, but not everybody uses Hotspot and in any case it wouldn't hurt if the code takes the right branch without jump in the majority of the cases.

Another possible profile-guided optimization would be to identify the order of loading classes and separate them by that - the early loaded in one jar, the latter loaded in another and the barely used ones in third - it can reduce the classloading time (if you put them on the classpath in the right order) and combined with the Java Modules proposal can help one create slimmer applications where you can start the app with the minimal jar and the rest is streamed as you work.

Obfuscation

The goal of the obfuscation process is making your code more difficult to decompile. Please note that I didn't say "impossible" - although decompiling obfuscated code exposes much less information and usually does not produce runnable Java code, it is perfectly possible for a motivated person to reverse-engineer obfuscated bytecode - it's just going to take longer. In the end it boils to the cost/benefit perception - if somebody thinks it will be cheaper to hack your product, they will - the obfuscation raises the bar to do it, but if you really care about the bottom-line, you might be better off with openavailable source and certain legal agreement (NDA, NCA and in some cases even patents might make sense).

Class/Method Renaming

The goal of the renaming is to make the classes and methods illegible. Usually this is achieved by changing the names to short identifiers (usualy one or two letters). This also reduces the size on disk and the perm-size by reducing the constant pool. If you specify the option to use lower and upper case letters for different class names, you can make the jar impoissible to extract on case-insensitive file systems as half of the classes would overwrite the other half. Another trick is to specify a dictionary of recommended identifiers, which contains all Java keywords. Since the keywords have meaning only in Java, but not in the bytecodes, a naive decompiler might produce funny uncompilable code (imagine for (int for=if; for<while.lenght; for++) else.add(while[for]);) - of course JAD handles this by recognizing and renaming the members, so I really consider this a wasted effort.

note: again, you will want to preserve the public interfaces, which is quite similar to the specification for the shrinking phase).

Flow Mangling

Since the decompilers recognize certain byte code patternsas result from a java statement, the obfuscator can reorder these, yielding semantically equivalent bytecode, which is impossible to map 1:1 to Java (JAD handles these with labels and goto). Also, I've seen obfuscated code using loops, breaks and exceptions to simulate IFs, but I'm not sure which obfuscator does these. (Zelix?)

String Encryption

I think it was Zelix KlassMaster that could substitute each string with encrypted version and insert code to decrypt them at runtime. This is very efficient measure as the strings usually give away a lot about what the code is doing (especially logging statements.)

Stack Map Generation (Preverification)

J2ME JVMs feature simplified class-loading mechanism which requires each method to declare how much stack space is it going to use in the worst case. The J2SE JVMs are smart enough to do this at runtime, stil this slows down the classloading. ProGuard can generate the correct StackMap attributes for the obfuscated code, so for slightly larger disk footprint you would get faster loading.

So that's about it. I figure that here is the place to throw in a couple of URLs:

yGuard - some people like its XML syntax. I don't think it's much different than ProGuard. The company producing it requires that you use it if you use their core product. It makes sense for them to want to take care about the actual protection of their IP.

JoGa was another tool that I used for J2ME, focused on bytecode optimization and had a nice GUI with many tweaks and gadgets. Unfortunately, it looks like the site is down.

So all in all it took me about 6 hours to get everything obfuscated and in one jar. I had to disable the optimization and shrinking phase because I couldn't hunt down all the SWT JNI dependencies, still the resulting size was 2/3 of the original and the app was starting up noticeably faster.

The final touch was to wrap the single jar in Launch4J binary launcher, so the user would need resource editor to even get to the jar. Launch4J provides some small but nice features like JRE detection (from registry), JRE version checking, custom icon and Windows metadata.