Mystery File System

Last week Jim posted a comment asking about reverse engineering the firmware for some Chinese routers with the intention of extracting the Web files and translating them to English.

Although I usually work with Linux based firmware, this sounded interesting so I thought I’d investigate. Although I wasn’t able to completely recover the Web files, the process of reversing a file system format seemed like a good subject for discussion.

The firmware image contains none of the normal file systems found in Linux firmware, no identifiable compression formats, and the only intelligible strings in the image are the names of the Web files themselves:

It looks like the ‘owowow’ string is followed by a list of all the Web files in the firmware.This isn’t just an array of strings however; there is additional data included. The pseudo-structure for the data layout appears to be:

This looks like it could be a simple file system, but searching Google for the ‘owowow’ string didn’t turn up anything interesting. So it is either custom, undocumented, or we are completely off track. The latter seems unlikely, so let’s try to identify the file system structure.

The last entry appears to be for the WzdWlanRpm.htm file. As with the other strings we saw, this null padded string is followed by 8 bytes of binary data, which ends at offset 0x0F75DF. Immediately after this are the bytes 5A 00 00 80:

Since there are no other obvious strings in the firmware, we can assume that the files themselves are probably compressed. This makes the bytes 5A 00 00 80 very interesting, because they are very similar to the magic bytes for LZMA compression which are 5D 00 00 80.

Let’s assume that the bytes 5A 00 00 80 are the magic bytes for a compression algorithm. If so, they will probably be at the beginning of all the Web files. Let’s see how many instances of this byte string we can find in the firmware:

Although doing this type of simple search isn’t always completely accurate, there are at least 140 instances of these bytes in the firmware image. Let’s compare that to how many Web files are listed in the firmware:

This looks encouraging! The bytes 5a 00 00 80 are likely present at the beginning of each file.

If the structures we found earlier are part of a file system, there are two pieces of information that will need to accompany each file name:

Where is the file located?

How big is the file?

First, let’s look at the 12 bytes immediately following the ‘owowow’ string. We’ll assume that these are integer (4 byte) fields, and we’ll cast each 4 byte field as various data types to see if we can make sense of any of them:

Next we need to determine what the 8 bytes that follow the file names represent. Since we still need to know the size and location of each file, it stands to reason that these bytes represent those values. Let’s look at the first file entry:

The two values 00 00 05 BB and 00 00 1A 6C likely represent the file size and an offset to the file’s location in the firmware image. Recall that the ‘owowow’ magic bytes indicating the beginning of the file system are located at offset 0x0F5B74. Adding 0x01A6C to 0x0F5B74 gives us 0x0F75E0, the exact location of the fist occurrence of the bytes 5A 00 00 80 that we identified earlier:

If the second 4 byte value is the file offset, then the first four byte offset is likely the file size. Adding the 0x05BB value to the file offset 0x0F75E0 gives us 0x0F7B9B. At the very next byte (0x0F7B9C) we find the second occurrence of the the bytes 5A 00 00 80:

Now that we know how the file system is constructed, we can write a utility to extract the files. I’ve written an unowfs tool to do just that; you can download it here, and use it to extract the files from the OWFS image:

Although we now have the files extracted from the firmware image, they are still compressed. Unfortunately, the file utility doesn’t recognize the extracted files, and Googling for ‘5A 00 00 80′ didn’t turn up anything useful. Given the lack of strings in the firmware image, it is likely that the remainder of the firmware is also compressed, making code analysis impossible without first decompressing the data (there appear to be no standard compression or archive headers elsewhere in the firmware).

Since I don’t have one of these routers myself, this is where I’ve stopped. Looking at some internal images of these devices, it appears that they do have a serial port. Simply observing the debug output from the boot loader and OS during start up may reveal some hints on where to go from here, and if nothing else it should be possible to dump the SPI flash chip in order to get the boot loader code.

So if anyone can shed some further light on how these files are compressed, let me know!

UPDATE:

ghjm and insn left comments regarding the compression used for these files. They are LZMA:23 and can be decompressed with p7zip/lzmadec:

After some further probing, I’ve concluded that this firmware is definitely VxWorks based. Whether this file system is exclusive to VxWorks I can’t say for sure, but I wouldn’t be surprised if it was.

UPDATE #3:

It looks like Ruben from IOActive has a little more insight into this file system. As suspected it is a VxWorks pseudo file system, called MemFS (aka, Wind River management file system). Thanks to Ruben and Sergio!

It reminds me of reverse engineering various formats used by Empire Total War. They were often a lot more complicated, but basic workflow was similar, and at least they never bothered with serious nonstandard compression.

Maybe the microcontroller’s name could shed some light on the filesystem used in the device? For example, Michochip has it’s own MPFS2 file system which was designed primarily for storing static web files.

Very instresting article. I had to do the same with a custom file system used in Digimon Digital Card Battle (US and J)(PSX).
It had some .drv files, that lucky were uncompressed, and inside, all the .tim .str, a DEBUG folder ^^ with map files inside, and more stuff.
The jap version even had a demo of some kind of game i don’t even know what was about because it was 100% in japanese, so counld even figure out, what it was, and pressing the short/long text didn’t help much.

Love this kind of article, please do more like this, if it can be a bit little more advanced / H4Xx0r level, much better

my suggestion of hardware i would like to see is one of those cheap chinese MP4s that use a RockChip soc. Not linux based but intresting in fact as the hardware has a lot of potencial o explore

Unfortunately I don’t know of any big RE communities that are focused on embedded systems. Although focused specifically on routers, OpenWRT’s site is a good resource. Also OpenRCE is a great reverse engineering site in general, and there are some embedded-specific discussions there.

Do a Google video search for ‘reverse engineering embedded systems’, there are some good conference videos that you might find interesting.

This reminds me a lot of my work with reversing Nintendo 64 games. The “file table” containing offsets and sizes, an LZ-based file compression format with a magic identifier not seen elsewhere, all very similar. The only big difference is that N64 games’ files tend not to have names, and are referenced only by their offset.

Hi
We have a Vxworks bin file. I extracted the zlib portion and changed the html file content and then loaded into the device.Sadly it failed stating chksum failure.Even in the original hexdump of the file changing a single value is throwing chksum failed when loading to the device. Suggestions please.

I hope the file is having chksum embedded into it.
How to find it and modify it.

strings utility found a strange string: “SkOsMo5 fIrMwArE”. Binwalk found a lot of ZLIB files. Going forward, there are some ascii files, and this is the moment, where i stopped. What should i make at this point?