Archive for September, 2007

Worked on some minor translator related optimizations this week, for some decent gains. If I can knock off another 10-20% I should be just about in the “real-time” ballpark at long last. The big candidates are memory accesses (sh4_read_long is nearly 20% of the total runtime at the moment by itself), and PVR2 status updates (close to 20% under at least some loads). It’s not that either is particular complex or expensive, they’re just called a lot. I can probably knock another 1-2% off the main translate-and-execute loop too.Changes:

Fixed heap smash in the translation cache

Added initial GLSL shader support

Rewrote translation exit block (Gained ~10% performance out of it and freed up EDI in case it’s worth using elsewhere). The system also seems to build correctly with -O2 now, which gives another 10% improvement.

M3 is now out as promised. Essentially the same as M2 but with better performance courtesy of the translator. I’d appreciate feedback from anyone who’d like to try it out and let me know how it runs.M4 work plan (Dec 07):

As of this morning the system successfully runs all the way through the boot sequence using the translator core. Currently performance is around 66% of full speed on the dev machine – a huge improvement over the emu core. In fact, it’s almost playable…

I’ll do some testing tonight, and as long as I don’t find anything critical there’ll be an M3 tomorrow (pretty much just M2 + SH4 translator)Changes

Many bug fixes later (mostly dumb errors of either the cut-n-paste or pure braino variety), the translator is almost running correctly. At least now it runs well enough at least to start to collect timing information – at the moment it looks to be running at around twice the overall speed with the translator running (compared to with the pure emu core) on the BIOS startup. Which is a nice start, but not really nice enough unfortunately. On some tight test loops it actually executes at 10x emu speed – closer to where I’m aiming for.
The next step (other than clearing out the remaining bugs) is to start collecting some statistics, and see if there’s some simple peephole optimizations we can apply.

After much umming and ahing, I’ve scrapped the translator generator for the time being – it’s become far too complex for its own good, and just wasn’t going to be finished in a reasonable time. I will be keeping the original (much simpler) decoder generator though as part of the lxdream source tree. So, instead I’ve been working on finishing the instruction-at-a-time translator (ie, the simplest thing that could possibly work) with a view to getting it working and seeing what the performance is like, before getting into anything more complex.

The translator is now in an early testing form, which is to say that it’s mostly complete, and you can actually run it on real code, but it doesn’t work very well yet. I’ll be spending the next few days polishing things and getting the test suite running correctly, and then we can start testing for real.Changes: