Reverse Engineering VxWorks Firmware: WRT54Gv8

Lately I’ve been working on taking apart some VxWorks firmware images. Unfortunately, I could find precious little information available on the subject, so today we’ll be extracting the VxWorks kernel and application code from the WRT54Gv8 firmware image and analyzing them in IDA Pro.

The WRT54G series infamously switched from Linux to VxWorks with the release of the WRT54Gv5. Because VxWorks is a proprietary RTOS, it is a less familiar environment than a Linux based system. Even once you identify the different sections of the firmware image, there usually isn’t a standard file system full of standard ELF executables that can be automatically analyzed by a disassembler.

The overall process for reversing this firmware is pretty straight forward:

Identify and extract actual executable code from the firmware image

Identify the loading address for the executable code

Load the executable code into IDA Pro at the appropriate loading address

Augment IDA’s auto analysis with manual/scripted analysis

Debugging with JTAG or observing debug messages over a serial port can probably be substituted for steps #1 and #2, but since I don’t have any VxWorks WRT54G routers, this will be a purely firmware based analysis.

The first step is to locate any identifiable data sections in the firmware image:

Binwalk has identified a lot of gzipped Web files, a few LZMA signatures, an ELF header, and even a JFFS file system.

The JFFS file system is almost certainly a false positive, so we’ll ignore it.

Looking through a hex dump of the firmware, the gzipped Web files do look to be part of a simple file system, similar to the the OW file system discussed previously. However, the Web files are not particularly relevant for the purposes of this discussion, so I will forgo an analysis of the file system here; if desired, these files can simply be extracted from the firmware image and gunzipped.

There were four LZMA signatures found, but all except one have very large uncompressed sizes (several hundred MB each), so these are probably false positives. However, the first LZMA signature at offset 0x194F0 is just over 3.5 MB. This is a reasonable size, so let’s extract and decompress the LZMA data:

There are definitely some interesting strings in this data, including what appears to be application strings for services such as HTTP and HNAP.

There is some binary data in there as well, which may or may not be executable code. However, if it is executable code, there is no distinguishable header or section information; this makes analysis much more difficult. We also don't yet know the CPU architecture or endianess of the target (although this could be found with a Google search).

We do, however, have an ELF header located at offset 0x200 in the firmware image, so let's take a closer look at that:

It's a little endian MIPS ELF file with several string references to 'VxWorks', 'Wind River' and 'Kernel'; this looks like it could be the VxWorks kernel. Let's load it into IDA and see what we can make of it (be sure to select the mipsl CPU):

Because this image has an ELF header, IDA's auto analysis does a very good job of identifying functions and resolving symbols for us. Let's take a look at the first subroutine, startInflate:

The address 0x80001000 is loaded into $v0 and the decompressImage function is called; execution then jumps to the address stored in $v0 (0x80001000). So presumably, the decompressImage function decompresses some code to address 0x80001000, which then takes over execution.

Looking at the arguments to decompressImage, the first is _binArrayStart and the second is the address 0x80001000. Let's take a look at _binArrayStart:

The first five bytes at the _binArrayStart address are 6C 00 00 80 00, which looks like the beginning of an LZMA image. Comparing the bytes in _binArrayStart to the LZMA data that we extracted earlier, we see that they are identical:

Looking further into the decompressImage function, we also see that it calls another function named LzmaDecode:

So it appears that the LZMA data that we extracted earlier does contain executable code, and that it is decompressed and loaded into memory at address 0x80001000. A Google search for 'vxworks lzmadecode' turns up some source code that confirms this conclusion.

Based on the strings we saw earlier, the LZMA data likely contains the OS application code.

We now have enough information to load the extracted LZMA data into IDA for analysis. As with the kernel, we'll set the architecture to mipsl, but since this is a binary file we will also have to supply IDA with the proper loading information.

To do this, we'll set the ROM start address to 0x80001000, and the ROM size to the size of the file, 0x382A60. We'll also set the loading address to 0x80001000:

Once the file is loaded into IDA, go to the first byte and press 'c' to convert it to code. IDA will start converting bytes to code and performing an analysis of the binary file. The very first instruction is a jump that conveniently skips over some strings that are located near the beginning of the file:

Contrast the above disassembly with the same data after performing the same code analysis in IDA, but without setting the proper loading address:

There is also a decent amount of blue (code) in IDA's navigation bar:

Although our efforts have improved IDA's initial analysis, there is still a good deal of code that has been missed. I've written some simple IDA scripts which can be used to get more out of the disassembly.

First, we want to locate unidentified functions by iterating through the code looking for common function prologues. If one is found, we'll tell IDA to create a function there. This is sometimes a bit trickier for MIPS than for the Intel architecture, since function prologues are less standardized in MIPS.

The addui instruction is often used to manipulate the stack register ($sp) at the beginning of a function. We can see that this is the case for many of the functions identified by IDA:

However, there are some functions that precede the addui instruction with an lw instruction:

The create_functions.py IDAPython script will search the code (starting at the cursor position) for byte sequences that correspond to these instructions, and instruct IDA to convert them to functions.

Looking at the disassembly, the section of data containing the binary's strings appears to start at 0x802DDAC0, so I've coded the script stop at that address:

After running the script, IDA now has over 9,600 defined functions, and a lot more code has been identified:

However, there are still some sections of data that have not yet been analyzed:

These sections are surrounded by code, and navigating to several of these sections and converting them directly to code results in a valid disassembly:

Since these sections all appear to end with 'jr $ra' (the MIPS return instruction), and since they are not referenced by the surrounding functions, they are likely functions themselves.

The create_code.py script will walk through the code converting these unreferenced bytes to functions (as before, the script will start at the cursor position and end at address 0x802DDAC0):

We now have a nice solid block of code in IDA's navigation bar:

With the code taken care of, let's turn our attention to the strings. Without symbols like we had in the ELF file, we will have to rely on string references to provide key insights into what is going on in the firmware. However, there are still some ASCII byte arrays that have not been converted to strings by IDA:

Converting these ASCII arrays to strings will make reading the code much easier, so the create_ascii.py script converts all ASCII byte arrays to strings. As we saw before, the section of the image that contains the string data starts at offset 0x802DDAC0, so we'll place the cursor at that address in IDA and run the script from there. This results in data that is much easier to read and recognize:

With our strings now fixed up, let's try to identify some basic functions in the disassembly:

There are two function calls here; the first, sub_802A7F90, takes a single argument: the number 1, shifted left 16 bits (65536). If the return value is zero, a second function, sub_802A06E8 is called.

The second function takes two arguments: a string that contains common printf formatting, and the number 65536. The pseudo code looks like:

This makes it pretty obvious that sub_802A7F90 is the equivalent of malloc, and sub_802A06E8 is printf. We'll rename these functions appropriately so that references to them elsewhere in the code will be readily apparent:

That's it! We now have working disassemblies of both the kernel and application code that we can use to further analyze VxWorks for additional functions, bugs or vulnerabilities.

1. I’m pretty sure A LOT OF PEOPLE have known bugs from WRT54G already. You’re not the first. Also, meathive didn’t say he won with your bug.

2. This reverse engineering work is not a tricky bug like non-auth ftp login. Non-auth ftp login bug has been probably known for even 11-year old hackers. Not to mention, but, totally, obviously, different. Don’t say “former work.”

I believe there’s a bit of a language barrier here – I don’t think opt9 was trying to say that my work was a copy of his work, but rather that he had done some previous work on the WRT54Gs and wanted to share it.

I found one small mistake 🙂
lui $a0, 1
definition of lui is “Load Upper Immediate”:
“The 16-bit immediate is shifted left 16 bits and concatenated with 16 bits of low-order zeros. …”
so the call would look like this: malloc(1<<16) and would allocate 64k of memory.

I've done a fair bit of MIPS reverse engineering so I maybe help people out by telling them about the 2 things which take some time getting used to.

Argument passing (the only way I encountered this IRL):
$a0-$a3 (arg_0 to arg_3)
$t0-$t3 (arg_4 to arg_8)
rest on stack
$v0 contains the return value

Delay slot instructions:
jump/branch instructions (excluding "branch likely instructions" eg. BEQL):
"All branches have an architectural delay of one instruction. When a branch is taken, the instruction immediately following the branch instruction, in the branch delay slot, is executed before the branch to the target instruction takes place."

same for load instructions:
"The time between the load instruction and the time the data is available is the “load delay slot”. If no useful instruction can be put into the load delay slot, then a null operation (assembler mnemonic NOP) must be inserted."

so if you have something like this …
la $a0, aCanTOpenDirect # "Can't open directory %s\n"
jal printf
move $a1, $s3

the move $a1, $s3 instruction would be executed before the call to printf.
printf("Can't open directory %s\n", $a1);

When scanning a binary file for matching file signatures, you are bound to get false positive matches – probably a lot of them, as you encountered. Binwalk attempts to filter out false positive results, but the -a option disables this filtering.

Since this is your product, it should be pretty easy to figure out what is and isn’t a false positive. For example, do you use zlib? If not, then that’s probably a false positive. Is your devices a big-endian MIPS system? If not, that’s probably a false positive too. Unfortunately, binwalk doesn’t have signatures for everything and sometimes you have to do a little leg work. 🙂

Hi
We have a Vxworks bin file. I extracted the zlib portion and changed the html file content and then loaded into the device.Sadly it failed stating chksum failure.Even in the original hexdump of the file changing a single value is throwing chksum failed when loading to the device. Suggestions please.

I hope the file is having chksum embedded into it.
How to find it and modify it.

As I said, I’m not familiar with VxWorks header formats. You’d probably be better off asking Jeremy about this, as he is the one who reversed the VxWorks image format.

Have you tried the wrt_vx_imgtool that Jeremy put in the Firmware Mod Kit? This will supposedly re-build WRT54G VxWorks images.

Also bear in mind that just because your devices is running VxWorks doesn’t mean that it necessarily uses the same checksums. I’d use Jeremy’s work as a guide to get you started, but it very well may have changed since he did that work.

hi Craig and compliment for the blog!!
my router has this firmware http://www.sitecom.com/download/4017/WL601_V0.18.bin i extract the second part coded in lzma and in pfs/0.9 but all the .exe files are of 0 bytes.
How can i get the source of the .exe files??
analyzing the first part it seems to be the boot loader, but i don’t know what to do with it.
My goal is to try to emulate the firmware and possibly insert a telnet console that is not present by default.
anyone knows how to do it??
thanks,ciao!

The address 0x80001000 is loaded into $v0 and the decompressImage function is called; execution then jumps to the address stored in $v0 (0x80001000). So presumably, the decompressImage function decompresses some code to address 0x80001000, which then takes over execution.

why do you mention $v0? I thought $v0 and $v1 were registers for the return value of a function. and $aX is for passing arguments to the function. Isn’t this the right sentence…?

The address 0x80001000 is loaded into $a1 and the decompressImage function is called; execution then jumps to the address stored in $a1 (0x80001000). So presumably, the decompressImage function decompresses some code to address 0x80001000, which then takes over execution.

You are correct about the use of the $aX and $v0/$v1 registers. You are also correct that the address 0x80001000 is passed to decompressImage via the $a1 register.

However, notice that 0x80001000 is also placed in the $v0 register, which is subsequently stored to the var_10 stack variable in the delay slot of the call to decompressImage. After decompressImage returns, the stack value from var_10 is loaded back into $v0 just before the jalr $v0 instruction. This is the functionality I was particularly concerned with in this case.