Archive for the ‘PDF’ Category

Wow, it’s been a while since we last blogged. Ok, time to kick off 2011 🙂

A lot of excellent stuff has been written about Microsoft’s RTTI format — from the ISS presentations a few years back to igorsk’s excellent OpenRCE articles. In the meantime, RTTI information has “spread” in real-world binaries as most projects are now built on compilers that default-enable RTTI information. This means that for vulnerability development, it is rare to not have RTTI information nowadays; most C++ applications come with full RTTI info.

So what does this mean for the reverse engineer ? Simply speaking, a lot — the above-mentioned articles already describe how a lot of information about the inheritance hierarchy can be parsed out of the binary structures generated by Visual C++ — and there are some pretty generic scripts to do so, too.

This blog article is about a slightly different question:

How can we recover full UML-style inheritance diagrams from executables by parsing the RTTI information ?

To answer the question, let’s review what the Visual C++ RTTI information provides us with:

The ability to locate all vftables for classes in the executable

The mangled names of all classes in the executable

For each class, the list of classes that this class can be legitimately upcast to (e.g. the set of classes “above” this class in the inheritance diagram)

The offsets of the vftables in the relevant classes

This is a good amount of information. Of specific interest is (3) — the list of classes that are “above” the class in question in the inheritance diagram. Coming from a mathy/CSy background, it becomes obvious quickly that (3) gives us a “partial order”: For two given classes A and B, either A ≤ B holds (e.g. A is inherits from B), or the two classes are incomparable (e.g. they are not part of the same inheritance hierarchy). This relationship is transitive (if A inherits from B, and B inherits from C, A also inherits from C) and antisymmetric (if A inherits from B and B inherits from A, A = B). This means that we are talking about a partially ordered set (POSet)

Now, why is this useful ? Aside from the amusing notion that “oh, hey, inheritance relationships are POSets“, it also provides us with a simple and clear path to generate readable and pretty diagrams: We simply calculate the inheritance relation from the binary and then create a Hasse Diagram from it — in essence by removing all transitive edges. The result of this is a pretty graph of all classes in an executable, their names, and their inheritance hierarchy. It’s almost like generating documentation from code 🙂

Anyhow, below are the results of the example run on AcroForm.API, the forms plugin to Acrobat Reader:

The full inheritance diagram of all classes in AcroForm

A more interactive (and fully zoomable) version of this diagram can also be viewed by clicking here.

For those of you that would like to generate their own diagrams, you will need the following tools:

Today I analyzed a malicious PDF file that contained more than 1100 lines of heavily obfuscated malicious JavaScript code. To make it easier for me to deobfuscate the code, I added two new features to our PDF malware analysis tool PDF Dissector: Variable references and snapshot histories.

The variable references feature shows you where variables are used in JavaScript code. Just place the caret over a variable identifier and all lines that use that variable are shown to you. You can see what this feature looks like in the screenshot below.

Showing all uses of the variable tonsSap

The snapshot history feature allows you to take JavaScript source snapshots of known states. Later on, you can then revert to the source code to the recorded snapshots. This is very useful when you accidentally remove JavaScript code that later turns out to be needed after all. The screenshot below shows you a snapshot tree of four named snapshots I made during different states of the deobfuscation process.

Today we are releasing a new version of our PDF malware analysis tool PDF Dissector. This release fixes two PDF parsing bugs reported by our customers. The first bug led to problems when PDF files were using unexpected null-bytes in the PDF file. The second parsing bug led to problems with unexpected PDF comments.

Especially that second parsing bug was very interesting. A customer sent us a PDF malware file that strategically placed PDF comment strings everywhere to confuse PDF parsers. To be able to analyze this file manually, it was also necessary to add a new feature to PDF Dissector. It is now possible to hide PDF comment strings from the PDF browsing tree. Just take a look at the two screenshots below to see why this is really useful.

Apart from a few bug fixes, version 1.5.0 of our PDF malware analysis tool PDF Dissector brings two very cool new features.

The first cool new feature is that PDF Dissector now supports the decryption of RC4-encoded strings and streams. This is very useful because there are a few PDF malware samples in the wild that encrypt their strings and streams using RC4 (a standard PDF format feature). In the past, PDF Dissector was not able to analyze these PDF files. From now on, PDF Dissector can be used on those samples too.

The second cool new feature is an improvement to the plugin API that allows plugins to extend the context menu of PDF file nodes in the PDF browsing tree. This was inspired by a customer who asked for a way to generate reports with PDF Dissector. I implemented a small report generator as a Python plugin to make sure that all customers who want to generate reports can easily modify the content and the layout of the generated report.

PDF Dissector 1.4.0 (Product site / Manual) fixes a few PDF parser bugs, improves the Adobe Reader emulation, and adds a cool new feature you can use for searching through all open PDF files.

Here is the detailed list of changes:

Feature: Added a way to search through the content of all open files.

Feature: Annotation names are now correctly emulated.

Bugfix: Fixed a parser bug that led to missed data streams if there was a comment between an object and its stream.

Bugfix: Fixed a parser bug that led to crashes when parsing invalid octal numbers.

Bugfix: All tabs belonging to a file are now closed when closing the file.

In PDF Dissector 1.4.0 you will now find a text field above the PDF browser where you can enter text strings to search for. The search function searches through dictionary keys, dictionary values, strings, data streams and other elements of PDF files and displays only those elements that match the search string. This is very useful if you want to answer questions like ‘which PDF files of my collection use JavaScript’?

The screenshot below shows how I used the filter to search through about 50 PDF files for exactly those that use the OpenAction command to execute some code when the PDF file is opened.

Along with RECon, the single most important date in the reverse engineering / security research community is the annual Blackhat/DefCon event in Las Vegas. Most of our industry is there in one form or the other, and aside from the conference talks, parties and award ceremonies, there’s also a good amount of technical discussions (in bars or elsewhere) that takes place.

This year, a good number of researchers/developers from the zynamics Team will be present in Las Vegas — alphabetically, the list is:

Ero Carrera

Thomas Dullien/Halvar Flake

Vincenzo Iozzo

Tim Kornau

So, if you wish meet any of the team to discuss reverse engineering, our technologies, our research, or the performance of the Spanish or German football team at the last world cup, do not hesitate to drop an email to info@zynamics.com — Vegas is always chaotic, and scheduling a meeting will minimize stress for everyone that is involved.

Specifically, the following topics are specifically worth meeting over:

Chat with Ero over our unpacking engine (just presented at RECon) — and how it fits into the larger scheme of things (e.g. VxClass)

Meet with Tim or Vincenzo to discuss automated gadget-finding for ROP, or anything involving the ARM/REIL translations

Last Friday I was at ReCon in Montreal to give a talk about obfuscated PDF malware. I got the idea for the talk during my work on PDF Dissector where I saw a lot of obfuscated PDF malware. The obfuscation I saw in the wild was mostly very limited and the malware authors did not seem to think things through to the very end. I took the opportunity to think a bit further about the whole topic of PDF malware obfuscation and a few of the result of these thoughts can be seen in the slides below. If you do not have Flash enabled, click here to download the slides.

The 1.3.0 release of our PDF malware analysis tool PDF Dissector is primarily a bugfix release to undo some of the bugs introduced in 1.2.0. However, I have also added a cool new feature.

I have added a way to quickly browse through the content of all decoded data streams. This is very useful if you want to quickly see what data streams contain potentially malicious content like embedded Flash files or AcroForms code. To account for binary resources and text resources you can switch between text mode and hexadecimal mode.

The screenshow below shows what the new feature looks like. You can clearly see the embedded Flash file on object 12 (note the Flash file header starting with FWS).

Today we are releasing version 1.2.0 of PDF Dissector, our PDF malware analysis tool. This release is primarily a bugfix release for the PDF parser. Several of our customers reported issues with specifically crafted PDF malware files which were not correctly parsed by PDF Dissector. A big thank you to all customers who reported these issues!

Here is the detailed change list:

Feature: JavaScript code beautifier can now beautify selected text only

Feature: Improved the Adobe Reader emulator a bit

Bugfix: Improved handling of UTF-16BE encoded strings

Bugfix: Fixed a parser bug that led to crashes when parsing certain cross-references tables

Bugfix: Fixed a parser bug that led to incorrectly parsed strings that contain escaped parentheses

Bugfix: Fixed a parser bug that led to incorrectly parsed strings that contain balanced unescaped parentheses

Bugfix: Fixed bugs in the Error reporting dialog that made automated reporting of errors fail

If you want to know more about PDF Dissector, please check out the manual.

API: Made it possible to access dictionary entries, array elements, and indirect references

I think the most important change was the API improvement. It is now possible to do really cool things with Python scripts and the PDF Dissector API. Having the ability to dump streams to a file is also really useful when you are analyzing malicious PDF files like the recent 0-day which made use of an embedded Flash file.