Author
Topic: How do you deal with stray code? (Read 1651 times)

What do I mean stray code: when you buy used components with nonvolatile memory online, the previous owner or the seller might not have wiped the chip.

For my old MCU and EEPROM I purchased I always test the chip before marking it as accepted on Taobao using my universal programmer, but from time to time this process reveals code stray on the chip. Usually I save it before wiping the chip, but how should I deal with those saved ROM images? I have no idea what hardware the code is for, often not knowing that architecture the code is for. (There was once that from some old AT28C256 revealed BIOS code for some SCSI card...)

I usually call up Assembly Control, and have the stray images taken to the local pound for either adoption or euthanasia.

If it's just random code on some random PROM, save it, make sure it's not important, then toss it, unless you want it. It's like buying a used wallet and finding a library card in it, there's really just nothing much it does to you.

I think the OP wants to know how to determine CPU architecture, given only machine code.

I don't think there is an automated way to do this. But what would be possible is to build a list of possible architectures, disassemble for each, and then see if the assembly code generated is sensible. For example, subroutine calls should have proper prologue/epilogue at entry/exit, some registers should be loaded upon entering a subroutine and result should be stored in a register before returning.

A human programmer can easily tell if a given disassembled code is REALLY code or if it's just a random bunch of bytes. This would be quite harder to do automatically using a software.

I think the OP wants to know how to determine CPU architecture, given only machine code.

I don't think there is an automated way to do this. But what would be possible is to build a list of possible architectures, disassemble for each, and then see if the assembly code generated is sensible. For example, subroutine calls should have proper prologue/epilogue at entry/exit, some registers should be loaded upon entering a subroutine and result should be stored in a register before returning.

A human programmer can easily tell if a given disassembled code is REALLY code or if it's just a random bunch of bytes. This would be quite harder to do automatically using a software.

What you are trying to do is a time consuming process, and you are not likely to find somebody that will do it for free.

But if I were you, I would get the IDA Pro disassembler, which supports disassembly of many different CPU architectures. It can also detect libraries of many toolchains automatically, using FLIRT ( ) technology. I have no idea how much IDA costs, if you are a hobbyist, it might be too expensive.

What you are trying to do is a time consuming process, and you are not likely to find somebody that will do it for free.

But if I were you, I would get the IDA Pro disassembler, which supports disassembly of many different CPU architectures. It can also detect libraries of many toolchains automatically, using FLIRT ( ) technology. I have no idea how much IDA costs, if you are a hobbyist, it might be too expensive.

I don't have IDA Pro and I am not really interested in finding out what the processor is - that might as well be configuration data for all I know. What I am asking is what is the polite and safe way of handling it.

Well... being an open-ended task, do it the same you'd do anything else.

Stare at it a while.

Is it repeating? Might be tables of some sort.

Any ASCII (or EBCDIC, or other encoding for that matter) apparent?

As mentioned, if it's slices of a 16 (or more) bit bus, you'll see that more obviously in human-readable formats like ASCII, but less so in others...

Basic stat and crypto checks: do a frequency analysis. Do an entropy analysis. Look for sentinel codes, or magic numbers. Look for checksums or hashes (often at the top or bottom of the image).

If it's very high entropy, it's possible it is compressed or encrypted. Would be unusual for a ROM I would think, but who knows. Back in the ROM days, that sort of thing was rather expensive to do, except when absolutely needed (and that, mainly for tape and demo purposes?).

If you aren't familiar with many (or any) instruction sets, binary will just look like gibberish to you. You'll have to learn a few first -- x86, Z80, 6502 and 68k might be good starting points. This will take the better part of a year; queue up some projects to use each one, so you have motivation.

Offhand, I know that:- Z80 machine code is pretty simple, and makes frequent use of middling-range (ASCII readable) codes. This can make it difficult to eyeball if a passage is code or data!- x86 tends to be noisier, with occasional patches of recognizable offsets (e.g., load-immediate, absolute address, index indirect..). That is, groupings of similar numbers keep appearing, often either small addresses or offsets or values (say "01 00" = 0x0001), or addresses to the same region of memory (0x70a4, 0x70a8, ..) suggesting a data segment there. The most recognizable, and usefully so, opcodes are push reg: you regularly see a "PQR" in the hex dump, for something like PUSH BP / MOV BP, SP / PUSH SI ..., or whatever they actually are. In other words, normal (Intel ABI) function preamble!- I haven't worked with other instruction sets in raw format very much, but the other ones that I'm familiar with are unlikely to be found in EPROMs anyway (e.g. AVR). So, thus ends my flavor text.

If nothing else, feel free to send them to someone who loves archiving stray code, like Jason Scott.

Polite and safe: just erase the chips and move on. If you're extra polite, you may even warn the seller about existing data. But I wouldn't bother. It's the responsibility of someone selling storage devices to take care of privacy issues and erase them before selling. If they didn't, why would you even care?

Now if you are curious and just want to take a peek at the data - which doesn't seem to be the case - have at it. Guess that would be some rather concrete case of "data mining".

I usually dump any random ROMs I find and browse through it for recognizable strings. Occasionally I find something potentially useful or interesting like the BIOS from some vintage PC or peripheral, other times it's just interesting to see what's there.