A Code Signature Plugin for IDA

When reversing embedded code, it is often the case that completely different devices are built around a common code base, either due to code re-use by the vendor, or through the use of third-party software; this is especially true of devices running the same Real Time Operating System.

For example, I have two different routers, manufactured by two different vendors, and released about four years apart. Both devices run VxWorks, but the firmware for the older device included a symbol table, making it trivial to identify most of the original function names:

VxWorks Symbol Table

The older device with the symbol table is running VxWorks 5.5, while the newer device (with no symbol table) runs VxWorks 5.5.1, so they are pretty close in terms of their OS version. However, even simple functions contain a very different sequence of instructions when compared between the two firmwares:

strcpy from the VxWorks 5.5 firmware

strcpy from the VxWorks 5.5.1 firmware

Of course, binary variations can be the result of any number of things, including differences in the compiler version and changes to the build options.

Despite this, it would still be quite useful to take the known symbol names from the older device, particularly those of standard and common subroutines, and apply them to the newer device in order to facilitate the reversing of higher level functionality.

Existing Solutions

The IDB_2_PAT plugin will generate FLIRT signatures from the IDB with a symbol table; IDA’s FLIRT analysis can then be used to identify functions in the newer, symbol-less IDB:

Functions identified by IDA FLIRT analysis

With the FLIRT signatures, IDA was able to identify 164 functions, some of which, like os_memcpy and udp_cksum, are quite useful.

Of course, FLIRT signatures will only identify functions that start with the same sequence of instructions, and many of the standard POSIX functions, such as printf and strcmp, were not found.

Because FLIRT signatures only examine the first 32 bytes of a function, there are also many signature collisions between similar functions, which can be problematic:

Alternative Signature Approaches

Examining the functions between the two VxWorks firmwares shows that there are a small fraction (about 3%) of unique subroutines that are identical between both firmware images:

bcopy from the VxWorks 5.5 firmware

bcopy from the VxWorks 5.5.1 firmware

Signatures can be created over the entirety of these functions in order to generate more accurate fingerprints, without the possibility of collisions due to similar or identical function prologues in unrelated subroutines.

Still other functions are very nearly identical, as exemplified by the following functions which only differ by a couple of instructions:

A function from the VxWorks 5.5 firmware

The same function, from the VxWorks 5.5.1 firmware

A simple way to identify these similar, but not identical, functions in an architecture independent manner is to generate “fuzzy” signatures based only on easily identifiable actions, such as memory accesses, references to constant values, and function calls.

In the above function for example, we can see that there are six code blocks, one which references the immediate value 0xFFFFFFFF, one which has a single function call, and one which contains two function calls. As long as no other functions match this “fuzzy” signature, we can use these unique metrics to identify this same function in other IDBs. Although this type of matching can catch functions that would otherwise go unidentified, it also has a higher propensity for false positives.

A bit more reliable metric is unique string references, such as this one in gethostbyname:

gethostbyname string xref

Likewise, unique constants can also be used for function identification, particularly subroutines related to crypto or hashing:

Constant 0x41C64E6D used by rand

Even identifying functions whose names we don’t know can be useful. Consider the following code snippet in sub_801A50E0, from the VxWorks 5.5 firmware:

Function calls from sub_801A50E0

This unidentified function calls memset, strcpy, atoi, and sprintf; hence, if we can find this same function in other VxWorks firmware, we can identify these standard functions by association.

Alternative Signatures in Practice

I wrote an IDA plugin to automate these signature techniques and apply them to the VxWorks 5.5.1 firmware:

Output from the Rizzo plugin

This identified nearly 1,300 functions, and although some of those are probably incorrect, it was quite successful in locating many standard POSIX functions:

Functions identified by Rizzo

Like any such automated process, this is sure to produce some false positives/negatives, but having used it successfully against several RTOS firmwares now, I’m quite happy with it (read: “it works for me”!).

I’ve used it against other RTOS’s including eCos and SuperTask! and it’s worked pretty well. I’ve also used it against some statically compiled Linux binaries too. Of course, as with any such tool, its level of effectiveness is determined primarily by the similarity between the two code bases you’re comparing, but I’m working on improving the plugin’s effectiveness.

As for bzero/bcopy, I can’t say that I’ve really seen them used that much in embedded systems myself, but most RTOS’s will at least include them for backwards compatibility and to ease the porting of code.

I tried something similar for working with Cisco IOS images. I mostly used string references and function calls. It mostly works, but I’ve only really tried it on Cisco IOS images, so I can’t say much for how useful it is to other firmware images.

We currently use BinDiff to perform a similar task. This plugin looks cool, but outside of saving the $400 license fee for BinDiff, what does this plugin do that existing tools don’t already do?

Don’t get me wrong. I appreciate the work you put in to making this available to the community. I am just trying to understand how I might make the most effective use of this tool in our existing process.

I haven’t used BinDiff myself, but I don’t think this plugin would have any advantages over BinDiff. If you look at the BinDiff user manual, it basically does all this and more, so I doubt it would miss things that this plugin would find.