This blog chronicles my progress porting various retro games to other retro platforms. The goal in each project - at least when targeting a new CPU - is to effectively replicate the original graphics and the original code line-by-line, to produce a 100% accurate port of the original game.

Wednesday, 21 September 2016

Tonight, at the suggestion of the author of the Atari 800 Pentagram port, I thought I'd look at the masking logic. It certainly seemed like a good candidate, given the amount of time spent in the rendering routines and the fact that the masking requires two table look-ups per byte rendered.

Rather than try to optimise it, I thought I'd disable it altogether and see what effect it had on the performance. I was surprised, and a little disappointed, that it seemed to have very little in fact. Well, perhaps not completely insignificant at 4% in the 'busy' room, taking the frame rate from 8-9fps (see screenshot below) to a solid 9fps.

The infamous 'busy' screen, with fps counter

And because I now had a metric, I disabled the Z-ordering again. I clearly underestimated the effect last time, because the performance jumped to 13fps, and a reduction in the corresponding routine (calc_display_order_and_render) from 25% all the way down to 2%.

I was now a little bummed that disabling masking and Z-order completely still resulted in 13fps, a few frames short of my target 15fps. Curious as to how this compared to the original, I fired up the ZX Spectrum and Coco3 emulators side-by-side and entered the 'busy' room on each.

To my surprise, the Coco3 (with no masking or Z-order) was significantly faster. So I re-enabled Z-order. It was still faster. So I re-enabled masking - back to the complete code - and it was still faster at 8-9fps!!! For this screen at least, I had been trying to optimise something that was already too fast!

So where does that leave the project? What I need to do now is visit a significant number of screens and compare them, side-by-side, with the ZX Spectrum original. If the Coco3 is running faster in every case, then my work here is done! In fact, I'll need to (re-enable and) tweak the throttling function so that the screens are all roughly the same speed.

EDIT: I think I'm going to back-port my fps ISR from the Coco3 to the ZX Spectrum!

Then there's the matter of beefing up the Z80 R register emulation, fixing a graphical glitch that is simply a result of not being able to emulate the Spectrum's attribute bytes, and we're done!

Saturday, 17 September 2016

My profiler appears to be working for the most part, although any delusions I had about writing a generic 6809 profiler are pretty much dashed. I'll go into more details next post, but the nature of assembler makes it difficult - nay impossible - to identify the context (subroutine) of the executing code without some comprehensive code analysis (smells like a halting problem to me).

Regardless, with some inside knowledge of Knight Lore, I've got a pretty good handle on what's taking most of the CPU time now.

Thursday, 15 September 2016

Whilst I haven't been working on Knight Lore code per se, I have done some work that will ultimately assist in the optimisation.

I need a profiler, and since none of the Coco3 emulators have that functionality, I have to roll my own. My first preference was to hack MAME/MESS, but the barrier-to-entry is rather high. So the next candidate is Vcc.

Unfortunately Vcc currently compiles under Microsoft Visual C, which I actively try to avoid in my own projects, preferring GNU or otherwise open source toolchains. So I decided to try my hand at porting Vcc to GCC/MINGW. After hitting a few roadblocks that had me stumped for a day or two, I managed to finally get it all building and running! Mostly.

There wasn't a lot of code to modify, in fact about 10 lines in a handful of files. One problem I couldn't figure out how to overcome was a call to AfxInitRichEdit() which is part of MFC and therefore not available under GCC. Interestingly there is a RICHED20 library in the GCC/MINGW distribution which supposedly implements AfxInitRichEdit2(), but I had no luck. Somewhat encouragingly though, the documentation suggests that it's not always necessary to call this function.

So the solution for now was - Cut 'em out!

There are a couple of issues with the GCC build - for example the tape configuration dialog is mysteriously blank - but the real stick-in-the-mud for me is the fact that Knight Lore doesn't actually run. Galactic Attack ran just fine, so perhaps it's an issue with 32KB cartridges?

It's never easy, is it?

UPDATE: There was/is an issue with 32KB cartridge images; Vcc expects the two 16KB banks to be swapped - for some unknown reason - unlike MAME/MESS and also unlike you'd burn to FLASH or EEPROM for that matter. I've done a quick hack for myself so that Knight Lore will run as-is.

The tape configuration dialog was blank because it contained rich edit controls; after adding a call to explicitly load riched20.dll that's now sorted too.