Sunday, 27 January 2013

At Aperture Labs we have Code Monkeys and Chip Monkeys. Chip Monkeys do the dangerous stuff, while we Code Monkeys sit in our nice safe offices playing with bits and bytes and other things that can't hurt us...

Now, you're probably wondering what could possible be 'dangerous' in an IT Security business, and that's a perfectly fair question. How about boiling nitric acid for a start? Or fiddling with circuits that are directly connected to mains electricity? Exactly. Dangerous!

You might ask why you would want to dissolve chips in the first place? Usually, it's because you are trying to do something like reset a fuse to allow reading/writing of protected areas or probe a data track to observe data being processed by the chip, or even trying to figure out the actual logic of a proprietary chip by viewing and reverse engineering it's construction.In our case it's a combination of things, but the primary target at this stage is the program code that is stored in Masked ROM. The chip itself is using a known architecture and a published assembly language, so the only reverse engineering required is to recover the actual instructions stored in the ROM. As we can see from the picture below, this should be relatively easy as 'data' is clearly discernible:

If we zoom in we can see what looks like blobs of solder connecting vias, and we can guess that the presence of a blob represents a '0' or a '1' and the absence vice-versa:

Should be pretty easy to read then. All I need is some pattern recognition software and we're done! Well, that's the theory anyway. The obvious first candidate was degate. This is a very cool piece of software written by Martin Schobert for doing exactly this kind of thing - reverse engineering chips from images. However, after playing around with it I couldn’t find any obvious way to get it to read masked ROM. I thought I was being thick, so I contacted Martin and he confirmed:

"I know from a professional chip hacker that he uses Fiji to detect
inter-layer connections in the mask ROM. These vias are circle-shaped
and can be detected with the Fiji GUI by just clicking some menu items,
which results in a list of coordinates. I am interested to integrate
this functionality in degate, but never had a practical use case for
this. Anyway, vias can already be detected by Degate. Therefore, the
user draws some vias into the image and thus marks their position.
Remaing vias are detected by using "via matching", which is based on a
cross-correlation with the marked templates. After via matching is
finished, the coordinates are written into the logic model XML file.
From this, the coordinates can be extracted easily and have to be
processed into bytes with some kind of script. But it has to be written."

Hmmm... OK, so whatever the approach I'm going to end up writing code to extract the data so I figured that I may as well try to do the whole thing in one go - i.e. detect the data bits directly from the image and dump them to binary file for later analysis. Well, at least we now have a plan! :)

As anyone who knows me will already know, I am a big python fan, and if you're a regular at DC4420 you'll also remember that python has some excellent image processing tools which I previously used in our Smoke Detector Random Number Generator project, in the form of OpenCV.This really shouldn't be too hard - read in an image, detect patterns of rows and columns and extract the points. True in principal, but if you go back and look at the original image it quickly becomes obvious that there is a lot of extraneous data:

So whatever we do, we're going to end up having to either do a lot of manual intervening to prevent our program from detecting data that we're simply not interested in, or so specific in it's pattern recognition that it's likely to get a lot of read errors due to missing data that was 'corrupted' by a dark spot, poor lighting etc., etc.

I should qualify this a bit: in an ideal world, we would be looking at a perfectly clean image with absolutely uniform lighting and every feature would be easily recognisable. This is achievable if you have something like a university lab at your disposal (e.g. visual6502.org), or you are a legend like Chris Tarnovsky who has spent years building up a multi-million dollar lab, but for us mere mortals we are working with much less sophisticated kit! (I'll leave Zac to expand on that subject). We therefore need to make some compromises and make the most of what we've got.On that basis, I decided that instead of trying to automate the whole process, I would semi-automate it. In other words, I'd put the effort into making a tool that enables selection of 'interesting' data rather that trying to recognise it automatically.

The basic idea was to create a grid that layers over the image and where the lines intersect we have a point of interest which will easier to 'detect':

The full tutorial on rompar is on it's documentation page, but for this process, the steps were as follows:

First we must decide what we think the layout is. The full chip has 10 columns, but looking at the image available via the microscope, it would appear that we can see 5 and a half columns of 16 bits in a column, each seperated by a column of vias:

The horizontal lines are basically single rows of bits seperated by lines of vias. If we look a little closer we can see that the gap between the first and second row is a lot larger than the gap between the third and fourth, so as far as a repeating pattern goes, at the very least we need to include two rows of bits and two rows of vias. As will become clear later on, the bigger the pattern group we create, the less 'work' we will need to do to make our grid, so we can increase this figure to something bigger that still fits in a reasonable multiple into the image. In this case I have about 48 rows of bits so I chose 16 which will give me 3 'chunks' per image.

So now we load the image into rompar, tell it we're processing a 16x16 grid and apply a threshold filter:

$ rompar.py chip.png 16 16

Adjust the threshold until we've got very clear single points:

Left-Click on the first bit of the first column of data. This will create a vertical gridline:

Left-Click on the last bit of the first column of data. This will create the remaining gridlines for that column:

In the same way we can Right-Click on the first and last bits of the first group of rows and this will create our first set of intersections:

This is easier to see of we blank out the ROM image:

And now this is where the semi-automation starts to kick in. Having created our 16x16 intersection group, we can repeat it by simply Left-Clicking at the start of a new column group:

Or Right-Clicking at the start of a new row group:

Four mouse clicks later and we've 'gridded' the whole thing:

Now we simply hit 'r' for 'read' and the presence or absence of a bright spot within an intersection will be flagged:

Here we can see that we're getting quite a few bit errors as the data in the 'empty' columns three four and five is varying quite a lot, whereas it is clear from the original image that these columns should mostly contain the same values. This is due to small mis-allignements of the grid, and we can adjust those by selecting rows or columns and nudging them left/right or up/down. In the image above the eighth row in the first column is highlighted in white to show it's selected. After fine tuning, our read looks like this:

Much better! There are a few other features that make it easy to check/adjust the data, like being able to toggle the view between the thresholded image and the original, so individual bits can be inspected and/or set:

Magic. So after only a few minutes we have some 'real' data to play with. But now we have to answer some fundamental questions:

What order are the bits in - i.e. LSB/MSB?

What order are the bytes in - read all the rows and then the columns, or vice-versa, or something else?

What is a '0' and a '1' - the presence or the absence of a blob?

etc.

We really have no way of telling, and in fact, if you look around, you'll find examples of pretty much every variation you can think of. Here, Travis Goodspeed describes a Masked ROM which has 16 columns of 8 bits each, but each 16 bit word is made up by taking a single bit from each of the 16 columns. In our case we have a total of 10 columns, so this is an unlikely scenario. There are other examples on siliconpr0n.org.

However, in this case we are lucky: it's a known architecture with a published instruction set, so we should be able to simply compare the bits/bytes and juggle them around until they make sense.Step one was to figure out what is a '0' and what is a '1'. This should be fairly straightforward as again we are lucky: there is a large section of 'empty' ROM which should have a known/fixed value. The chip is an Atmel MARC4, and we can simply download the programming manual which includes the instruction set. An example project also shows that 'empty' space in a program is likely to contain the HEX value C1, which, according to the manual, is the instruction: 'SCALL $RESET' (Unconditional short CALL to address $008). This seems eminently sensible as any wayward code will end up hitting one of these and resetting the chip.

So lets look at our 'empty' area:

But that doesn't look right. C1 is the eight bit pattern: 11000001 in binary, so 16 bits of repeating C1s would be 1100000111000001, whereas we have a pattern that is either 1111000000000011 or 00001111111111111100 depending how we interpret the bits.As I said though, anything goes in bit placement, and in this case the clue is in the top of the image:

Here we can see that dropping into our group of 16 columns are 8 traces. Could this mean that we are actually looking at two sets of 8 bits interleaved?If we interpret our 16 bit pattern on that basis, it looks like this: 1111000000000011 ---------------- 1 1 0 0 0 0 0 1 1 1 0 0 0 0 0 1

Bingo! We have our 11000001 == C1. Sweet! Now I have the option to read as 16 bits and de-interleave in code, or create pairs of 8 bit grids that read every other bit. I opted for the latter just to keep things tidy.

So far, so good: we now know the overall bit arrangement and order within a byte (rompar reads MSB from the left by default). Now we need to answer the question of what order we read the bytes in. Again, the empty areas give us a big clue. If we read them in anything other than column order they will no longer be one large group of C1s, but small bursts of C1s interspersed with other data, which wouldn't make sense, so column order it is. The only remaining question then is where does the program start: top left, bottom left, top right or bottom right?

In order to make figuring this out easier, it would be helpful to be able to read program code rather than HEX values, and to achieve this we need to convert the HEX instructions into their human readable form as per the manual. I was unable to find a MARC4 disassembler on the net, so I ended up writing that too. This is not as tricky as it sounds - there are only a fairly small set of commands, and they convert rigidly into a stream of HEX, so reversing the process and converting the HEX back into commands and corresponding arguments is actually pretty trivial. The hardest/most tedious part is cutting & pasting the descriptions and creating the layout so it's nice and readable.

There is of course another advantage to doing it this way: at some point I'm going to have to wade through the extracted code and figure out what it's doing, and there's nothing quite like writing a disassembler to ensure you really understand how a language works! And so marc4dasm was born...

which is the start of Atmel's example program from their development kit (no longer available to purchase, but easy enough to find on the net). This is handy as I could cross check my output against their example listing, so I'm reasonably confident that my disassembler is working correctly.

The output can then be edited to give the variables and addresses meaningful names (if they can be figured out). You'll notice there are some 'meaningful' names already, namely $AUTOSLEEP, $RESET and $INTERRUPT_0 to $INTERRUPT_7, and this is because they are standard routines that must exist at certain locations in ROM. This will come in very handy later!

Reading the first set of interleaved bits from the chip's top left gives us:

Although as the last two bytes are treated as CRC they won't be included in the listing, so we would only expect to see the first two of the above lines, which indeed we do - the last two lines of our listing are:

We now have a correct $AUTOSLEEP routine as well as a reasonable $RESET, so we can be confident that we've identified our start of program. The orphans are probably only there because I haven't yet stitched all the data together so we're missing most of the code.

Now all I need to do is figure out their CRC algorithm and we can be 100% sure we've got all the data as per the original. Update: CRC was reverse engineered from a qForth implementation, so marc4dasm now includes CRC checking.BTW, rompar & marc4dasm can be found at the Aperture Labs Tools page.