While we’re at it, we’ve had a small contest for finding a name for this tool on our blog. In the end we have decided to go with the name PDF Dissector which is a name we came up with ourselves. However, we still want to give away the free license of PDF Dissector we promised for the contest. The runner-up entry was PDF Enspect, suggested by Dirk Loss. He will receive the download link for his free license soon.

We are proud to announce that we have added a new training about PDF malware analysis to the list of trainings we offer. This new training focuses on everything you need to know when you are dealing with PDF malware. Participants will learn about the following topics:

Useful tools for PDF analysis

The physical and logical structure of PDF files

An explanation of the most commonly exploited vulnerabilities of the last years

The many ways malicious code can be executed from PDF files

Common obfuscation techniques used by malware to slow down analysis

Automation of PDF analysis if you are dealing with many samples

Acrobat Reader internals

How to use RTTI, BinDiff, and other means to restore some thousand function names in the Adobe Reader JavaScript engine disassembly

Automated extraction of shellcode using dynamic instrumentation

If your organization or company would like to know more about the training, please contact info@zynamics.com.

Fixed a problem when tracking R0 register that was modified by previous calls. Now if the script is tracking R0 and finds a BX/BLX, it assumes that is modifying R0 and stops, marking the tracking as failed.

Changed the way the script parses the data references so it works both with release and debug binaries. Instead of getting the raw offset we now use recursive calls to idautils.DataRefsFrom(). For the references to work properly we had to make a pre-process converting all dwords to offsets in the classrefs and superrefs sections (similar to the offsetize() used by KennyTM).

In some cases, compiler can decide to use LR as a general register so the search for R0..R15 fails. Now the script includes the handling of this special case.

Added check of Thumb/non-Thumb code for patching the calls correctly.

Fixed bug that was getting the incorrect parameters for other flavours of msgSend(). Now it should be easier to add others.

Thanks a lot to everybody that reported bugs, and also to the betatesters!

Soon we will come with the Objective-C reversing part II with more improvements and details on static analysis. Stay tuned!

At zynamics we believe that good education is something we have to support. Therefore Sebastian and I decided to support Professor Felix Freiling and his two assistants Carsten Willems and Ralf Hund in their class called Software Reverse Engineering at the University of Mannheim, Germany. Sebastian held a lecture about Windows debugger internals and their use in reverse engineering which you can read about here. This week it was my turn to share some knowledge about architectural diversity in reverse engineering.

While architecture diversity is nothing new, still most people think that only x86 and x64 are interesting to look at because of their desktop computer market share. In my lecture I wanted to show that the range of interesting targets is far broader than generally believed. I started the lecture with a cherry picked set of architectures which are quite common in different usage scenarios. These architectures have some interesting differences between them to motivate a need for a more general reverse engineering approach. Even though a variety of general reverse engineering approaches exist I focused on our own approach, the REIL meta language. I gave a short introduction to the features of REIL and a language definition with an emphasis on its simplicity. After presenting small translation examples which show how REIL translation works I started with REIL use case examples. Prior to presenting and demoing register tracking as a simple use case, a very informal description about the underlying MonoREIL framework was presented. MonoREIL is an abstract interpretation framework which ships with BinNavi to assist an analyst in writing algorithms to answer questions about program states using a formally described method. Demoing register tracking and explaining how it works on top of MonoREIL rounded up the lecture.

I was asked to hold an exercise covering all topics of the lecture after presenting which worked out pretty well. I enjoyed being invited to give a lecture in Mannheim and I greatly admire the work which has been put into the lecture in general. If more Universities offered a reverse engineering class it would be a great plus for a lot of students.

The slides which I used for lecturing are in German and available here:

On our way back home from Black Hat Europe in Barcelona, Thomas and I were brainstorming about the most important changes to the field of binary code reverse engineering in the last 10 years. What has changed since then? What made the biggest impact? Remember: Back in the dark days of 2000, W32Dasm and Turbo Debugger were considered good reverse engineering tools. If you had a self-written tracer that logged the execution of conditional jumps you were basically a king.

Anyway, we came up with several trends and technologies we believe have changed the job of reverse engineers tremendously since 2000. Here they are:

Visual flow graphs for assembly code

First introduced in IDA Pro 4.17 (June 2001), the ability to view disassembled assembly code in graph form made the job of reverse engineers much easier. In essence, using visual flow graphs during reverse engineering raises the level of abstraction and understanding of code while at the same time lowering the required time and effort one has to invest. Before we had graphs we had to reconstruct control-flow structures like loops and if-else statements from linearly listed assembly instructions. With visual flow graphs we can just look at the graph and understand the control flow pretty much immediately.

In the following years other tools (such as BinNavi) were built around the idea of interacting with flowgraphs. Shortly thereafter, the graph engine of IDA Pro was improved (especially in IDA Pro 5.0, March 2006) to provide interactive graphing out of the box.

Python as a scripting language

Back in 2000, most reverse engineering tools were primitive and barely extensible. For disassemblers your best bet was a clumsy IDC implementation in IDA Pro 4. For debuggers the situation looked even bleaker. This all changed with the growing popularity of the scripting language Python and SWIG, a technology which allows programs to easily add a Python interpreter and expose a Python-based API. The first major step forward I can remember was the creation of the IDAPython plugin for IDA Pro which added a way to access the IDA API from Python (Gergely Erdelyi, 2004). Later we had tools like Pedram Amini’s PyDbg or Ero Carrera’s pefile that helped popularize the Python language in reverse engineering.

Today, Python is the de-facto scripting language of reverse engineering and many tools from IDA Pro to ImmunityDebugger or BinNavi support Python scripting.

Dynamic Instrumentation

Even though the technology is not brand-new (the first publications describing ‘Dynamo’ go back to 2000), the widespread use of dynamic instrumentation tools like DynamoRIO and Pin for reverse engineering certainly is. Using these frameworks you can build very powerful dynamic analysis tools that allow the monitoring and manipulation of instruction streams in a very transparent and highly efficient way. If you have never used either of these tools, you can imagine them like a way to efficiently receive a callback to a C/C++ program after every instruction. Using these, you can directly control every aspect of the targeted program, while incurring small overhead.

If you are looking for a new reverse engineering tool to do some research with, dynamic instrumentation might be for you: Working on actual program traces removes a lot of complication in comparison to the static case, and the many different productive uses of dynamic instrumentation are still far from exhausted. While relatively fresh and untapped, dynamic instrumentation tools are definitely a topic people talk about at IT security conferences and elsewhere.

BinDiff-ing

Many years ago, some smart people had a brilliant idea: If you compare an unpatched version of a file to a patched version of the same file, you can easily find what code was changed by the patch and use this information to quickly find vulnerabilities that were patched by the patch. Soon it became evident that new tools were needed that make the process of comparing two versions of the same file as quick and easy as possible. Our own BinDiff tool is maybe the most popular diffing engine for binary code today. However, the idea of comparing files is so popular that a number of free competitors have sprung up over the years. In general, these tools all work in the same way: Once the two input files are disassembled, the functions in file A are matched to the functions in file B and local changes to the matched functions are found and shown to the user.

BinDiff-style tools are now part of the standard toolbox of many reverse engineers, from vulnerability researchers to malware analysts and there is hardly another technology that rose as spectacularly as this one since 2000.

The end of SoftICE

Back in the days there was just one debugger everybody used for reverse engineering: SoftICE. SoftICE was a wonderful debugger originally written by a company called NuMega from New Hampshire. It was a debugger that allowed you to debug user-land programs as well as kernel-land programs on your blog.zynamics.com machine without the need for any complicated setup. Later, NuMega was bought by Compuware and SoftICE was discontinued in April 2006.

Of course, newer debuggers have replaced SoftICE today. Microsoft’s own WinDbg, while not nearly as pretty as SoftICE, is the new powerful and popular debugger on the block.

The arrival of the Hex-Rays decompiler

Back in 2000, decompilers sucked. Today, there is Hex-Rays. Back in 2007 the team behind IDA Pro released the first decompiler I am aware of that is actually useful. Since then they have continued to improve the decompiler and they are already showcasing support for ARM decompilation.

While not many people seem to use Hex-Rays yet, this product is definitely one to keep an eye on.

Collaborative Reverse Engineering

Back in 2000, collaborative reverse engineering was unheard of as it was really difficult to exchange reverse engineered information between two databases created by the same program, let alone between different programs. In recent years the situation changed a bit, probably mostly out of necessity. Software today is much more complex than it was ten years ago and very often teams of reverse engineers have to collaborate on the same project.

While still in their infancy, collaborative reverse engineering tools are here to stay and will probably become even more popular in the future. Reverse engineers will pick tools like Chris Eagle’s CollabREate for IDA Pro or our own BinCrowd to share their results with friends and colleagues.

Academic Approaches

Another trend of the last few years is that major universities research topics related to binary code reverse engineering. Among others, there are the University of Berkeley and Carnegie Mellon University which have done really impressive work in the last few years. At the same time, reverse engineers in the industry have begun to take note of academic approaches to reverse engineering. While academic approaches to reverse engineering are not yet in common use in the industry, we know many people and companies that are beginning to look into more formalized ways to reverse engineering. The popularity of the Reverse Engineering Reddit, maybe the primary resource for formalized reverse engineering on the internet, speaks volumes.

So, that’s our opinion. Maybe your opinion is different. Do you disagree with any of those advances or did we miss anything significant? Can you think of any technology that was supposed to be the future but then bombed spectacularly in practice? Let us know. 🙂

The login bug that plagued early testers of our free BinCrowd community server should be fixed now. If you had problems logging in to your account in the past, please try again now. Note that clicking on the confirmation link in the original confirmation email was buggy too. It is possible that your account was deleted automatically because it was not confirmed within 7 days. In that case just make a new account.

We have also improved the speed of file comparisons in the web interface a lot. Even large files like Adobe Reader’s acrord32.dll are now compared to all files in the database in just a few seconds.This is absolutely amazing if you want to compare your file to different versions of the same file, for example to figure out what changed.

Another improvement was made to the BinCrowd IDA Pro plugin which you can get from the zynamics GitHub account. It can now handle the upload of larger files more gracefully. Previous versions tended to crash when giants files (roughly >50K functions) were uploaded.

Our malware PDF analysis tool without a name still has no name. However, we would like to release the first version of it really soon and that’s why we need a name. If you know a name for the tool, please let us know through comments to this post. If we name the tool after your suggestion you will get a free life-long single-user license of the PDF tool.

Thanks to Navtej Singh, Mario Vilas, and others it was possible to improve the IDA Pro plugin that imports MSDN information into IDA Pro. Parsing of the MSDN documentation was improved and function argument names/descriptions are now copied from MSDN into IDB files. That means you now have full documentation about the function arguments of Windows API functions in your IDB files.

… and therefore we are happy to have sponsored Shawn Dean so he could go to the Wajutsu Keishukai Grappling Tournament in Tokyo – which HE WON. We are happy to have had to opportunity to sponsor him and even happier to see him succeed.

This semester, students can take a class called Software Reverse Engineering at the University of Mannheim, Germany. In this class, Professor Felix Freiling and his two assistants Carsten Willems and Ralf Hund teach approximately 20-30 students about topics like x86 assembly, Windows internals, and sandboxing of malicious files. The students then use their new knowledge in hands-on homework where they have to crack simple crackmes or analyze malware files.

Last December I was invited to give a guest lecture there about a topic of my choice. Of the available topics, the one that seemed most relevant to my work at zynamics was debuggers and debugger internals.

Yesterday my big day had come. I travelled to Mannheim to give the second guest lecture of my life (I blogged about the first one at Dortmund University). I gave a brief history of popular reverse engineering debuggers from SoftICE to WinDbg. I talked about common debugger features and how to use them for reverse engineering. I explained in detail what you have to do if you want to implement your own Windows debugger. In the end I spent a few slides talking about anti-debugging measures software uses to protect itself against reverse engineers.

I think the guest lecture went pretty well from my point of view. Unfortunately the students did not seem to be as interested in reverse engineering as the students at Dortmund University were. Maybe they would have paid more attention if they had known before that implementing their own Win32 debugger would be their next homework assignment. 🙂

Anyway, below you can find the German language slides I used for my guest lecture. If you do not have Flash installed, you can get a direct download here.