Looks like I have a new hobby, it all started four days ago with an innocent little thought:

"Hey, I Want to rip the sprites from Epyx 1994 DOS classic Battle Bugs." (and I should have left it at wanting instead of proceeding to actually do it)

So far I only managed to rip the CARDS file which contains unit descriptions cards and some other stuff that is used in-game in the little Fony-TV from the UI. Took me a whole day to figure out they were stored in an ancient 4 bit planar mode (was trying to open that file as RAW data in GIMP and kept looking at oddly shaped non-sensical patterns until it 'clicked' in the head) but once I figured that out, reading the bits and rearranging them with a little custom throwaway C code was straightforward.

Thrilled by that success I tried the same with the BUGS file to rip the sprite animations and I can even see some ant shape bits in there when I recode the bytes as 1bpp into a black and white image...:
(this one contains the converted bits from the player controlled ant and from the enemy ant(I think))
https://abload.de/img/onebug_bits1zvy1o.png

...but no matter how hard I tried, tried packed pixel decoding and planar decoding with all kinds of guesswork offsets, skipping bytes and whatnot and looking at scrambled pixel garbage for two full days without success in that glorious shotgun-stab-in-the-dark-walking-through-fog-mode...

...I finally gave up and started looking into x86 assembly coding and the world of DOS API interrupts and VGA/SVGA programming craptitude. Luckily DOSBOX could be build with a good enough builtin debugger which allows breaking depending on which interrupt gets triggered and with what parameter in the AH register...

...so on the fourth day now trying to rip those BUGS, I have ventured deep into the world of x86 opcodes and reading ancient docs to figure out how the game reads/decodes those sprites. So far, I figured out that the first 178 bytes of the file are just a header containing number of sprites and offsets for their bytes in the file. Next I need to figure out how the individual bytes per bug are actually used because they still don't make sense when trying to create readable pixels from them directly. I feel like I'm getting close (but then again, I felt like that periodically every couple of hours over the past few days :P ).

I figure this stuff will take some time, need to learn more about this bygone era of infinite pain but look(down there)... it sets some mode of which I don't know what it does yet to 'write unmodified' ... WRITE THE FUCK UNMODIFIED ... I feel this is important and it has a super optmistic feel to it, UNMODIFIED... unmodified must be good.

For the moment I'm not debugging in DosBox but just searching for accesses to the VGA registers in the disassembled file and then looking up, decoding line by line what it does there (or what I think it does depending on the bits I find in the VGA and x86 docs).

Meanwhile, I have taken another step back and started reading Abrash's other book "Zen of Assembly Language: Volume I, Knowledge". That, and the "Graphics Programming Blackbook Special Edition" appear to be freely available these days, as markdown versions that can be converted to fancy single-page html files with pandoc._________________0xDB

Neat! Just found the older 5.0 Freeware Version of IDA Pro. It runs a bit of analysis on the code and unlike ndisasm which just blindly decodes the opcodes into assembly language, it actually understands the header of the DOS/MZ executable format and uses that to prepare a better navigable version of the code. It even automatically adds very helpful comments on interrupt calls and port writes if it knows what they're being used for:

Had already identified this BUGS byte reading stuff via the DosBox Debugger but it is just so much nicer to have it in IDA which allows assigning names for memory locations(variables and functions) and immediately makes the code much more readable and easier to follow what's going on.
Starting to feel like this little reverse engineering adventure is getting a bit more structured now.

Doh! A day later after much confusion and long hours of following code manually, trying to find out what it does with those BUG bytes, I am sure those last two routines are much more generic in nature and not specifically related to reading BUG sprites only. At one point, the code was using the read bytes to set a palette, at another it initialized a MIDI playback routine with them... :P

So those routines should be called: READ_SOME_DATA_ITEM_BY_INDEX and READ_SOME_DATA_ITEM_BYTES instead.

So with these routines correctly identified, suddenly a lot of stuff that previously made no sense at all, now does (also thanks to continued reading of Abrash's Assembler book and cross referencing with Ralf Browns interrupt list).

The game seems to organize most of its data files with a header like this:
2 bytes: number N of data items inside the file
N * 4 bytes : offsets of items

When the game reads something, it does so by its index I in the file.
First it checks if there are even that many items in there, then reads the offsets for I and I+1(or in case of I being the last item in file, invents a 'next' offset at the end of the file).
It computes how much RAM it needs to allocate for the thing by subtracting the first offset from the next offset.
Then it proceeds to read the bytes into that newly allocated RAM.

With that newfound knowledge, I could finally find a place in the code, where it accesses the BUGS file:

...don't know what the bug_info_chunk pieces are used for yet, except the first two bytes which contain the data item index for the BUGS file.

As I explore the code, I keep a textfile open where I keep notes for whatever piece I'm currently trying to make sense of, so I can track and trace where those bytes are put and hopefully later see what they're being used for to give them meaningful names which then in turn make code elsewhere more readable. It's like the fog of war in a strategy game, each little bit of exploration reveals another tiny piece of the big picture.

I've been staring at this piece for quite a while now after finally tracking down the first actual access to the bytes for a BUG read from BUGS...but it isn't 'doing' anything interesting with them yet. :P

Just looking at all those mystery calculations almost made me want to give up but after some confusion about all those 'push' statements... (there are many more)... I started putting down a fragmented representation of the stack in my notes, hoping to learn what it's all used for later down.

So yeah with that in the stack it calls yet another mystery subroutine and I need to figure out what it does with the stack.

Already figured it uses BP+arg_x to access the parameters after initializing BP with the unmodified SP from the callee (edit: No wait, since it first pushes BP itself again, and then loads it with SP, the first two bytes in the BP+x addressing inside the routine just see the saved base pointer and then the previously pushed parameters follow).

So... will have to track back some more to find all the (***).

Lost count of how many times the mind has been thinking "I must be close" over the past couple of days.

Was wondering why the first local arg_0 is +6 there and not +2... reading up on the behavior of the 'call' instruction solved that. It pushes CS and IP implicitly (at least in 8086 real mode which is what Battle Bugs uses) before jumping, so at seg0003:7EDB the relevant portion of the stack should look like this and I can see exactly where the parameters came from now (now to investigate what they do/mean):

And with renaming the locals in that procedure according to the stack locations used in this particular call, shit does not make much more sense than before.

The fact that this procedure gets called from all over the place in the rest of the code, makes me think it is another one of the generic routines, not specifically related to bug graphic data but perhaps to all sorts of graphic data.

It scans, word by word, some area in the current data segment for a magic value xx80h and then somehow uses the found location as a base for various address calculations in the same data segment where it then writes down some of the passed parameters, sometimes the same parameter twice in different locations.

The hardcoded offsets seem to suggest that all it does there is fill some structs... likely some bookkeeping information about the loaded stuff (about bugs in this instance)... so it still is not 'using' the bytes read from BUGS.

I figured it was pointless for the moment to try to follow all those dynamic RAM allocations and tracking where they would be used again without first gathering more knowledge, so I went back and analyzed some bits and pieces around where the code calls the video interrupts to set screen modes and together with some screenshots analyzation tried to make sense of it, identifying those variables in hope of seeing them pop up later in some drawing routines.

Aye, this stuff can be brain-melting. I'm currently wading through the code at the entry address, evaluating line by line, following it down into every subroutine, naming more locations (often like 'something_about_data_item_bytes0and1' or 'couldbe_this_and_that') just so that I can try to make sense of it later instead of just seeing numbers everywhere without knowing whether I've met that address before._________________0xDB

SiroccoModeratorJoined: 19 Aug 2005
Posts: 9470
Location: Not Finland

Posted: Tue May 02, 2017 4:49 am Post subject:

I also find it pretty wild that I totally missed out on this game. It came out in 1994 and got a US release, but somehow flew under my radar :/_________________NoOP / Reyn Time -- The $ is screwing everyone these days. (0xDB)

PoVModeratorJoined: 21 Aug 2005
Posts: 10971
Location: Canadia

Posted: Tue May 02, 2017 8:19 am Post subject:

I also completely missed this game, but to be fair I didn't have access to much PC stuff back then (I was 14). I had a single PC Zone magazine with Cannon Fodder on a Coverdisk though. I thought that was cool._________________Mike Kasprzak
'eh whatever. I used to make AAA and Indie games | Ludum Dare | Blog | Tweetar

Battle Bugs, Wolfenstein3D and Stunts(aka 4D Sports Driving) were the games that made me want to upgrade from C64/Amiga500 to PC Gaming. Don't recall exactly but probably did not get one until mid 95 or something and it still had Windows 3.x on it. A friend of mine had one first and we'd often meet after school to break our heads over some of the more challenging Battle Bugs missions and back in school, instead of paying attention we drew tons of Battle Bugs scenes in our school books... sadly all that art is lost. :D

Still trying to make sense of it but I feel like I will have to end up analyzing almost all of it to even get close to making sense of the graphics loading stuff.

Today I spent a couple of hours identifying and then naming/documenting where it initializes EMM stuff, which was extra fun because apparently segments can be resized at runtime, so at first I could not find ds:SOMETHINGs... all the SOMETHINGs for the EMM stuff were located beyond the segment as it must have appeared at compiletime.

Five blue stray dogs times salted paintbrush divided by slightly less than a handful of unusally noisy snails.

That's to illustrate how much sense most of the code makes before analyzation. Yesterday, I stared at a blotch of soap in my hand for a solid minute while the mind tried to decipher its value and meaning. I think it even started to make sense before it snapped out of it and realized there was nothing to decode there._________________0xDB

DiabloContributorJoined: 19 Nov 2015
Posts: 388
Location: :( :(

Posted: Wed May 03, 2017 5:47 am Post subject:

Have you thought about going back to trying to parse the files like you did with the cards? Could you find a few open-source games from the same time period and see how they might have approached loading sprite sheets? Maybe you could find a common theme to the logic of the time that would help in unpacking the data in the bugs game.

Yeah, spent countless hours staring at glorious pixel garbage with that approach. Looking at open source DOS games from the era is an interesting idea, would not know where to start looking though and I think the chance to accidentally stumble upon a game that uses the exact same encoding is very low.

As long as I keep identifying a couple of routines a day and naming more and more locations, I should figure this out eventually. Today I found some timer related stuff and sound/midi loading and conversion to output hardware routines and I feel like I have a good grasp of how the game utilizes the EMM pages it allocates and where it stores control information about that. The more variables I identify for other stuff, the closer I will get to finding the graphics/sprites related things._________________0xDB

...and a (much longer) excerpt of what's left to identify (some names pop up here and there already) so a vague image of the top-level structure of the program without all the routines deep down already starts to form.