Tuesday, August 21, 2007

Lately, finding spare time to work on desmume has been hard. What's more important, finding any motivation to work on it, has been really hard.

First, I'm willing to work on stuff more related to general 3D rendering again, instead of emulators. Working on an emulator for a month or two is nice, but it's been a whole year since I started to work on the desmume source code that yopyop released. It's important to note that on my priority list, 3D rendering is way more important than emulation, and I've been ignoring that fact for way too long.

Second, I've already many of the games I wanted to be playable, on that status. Mario Kart DS is probably the only exception, as I'm somewhat curious how it works, and the polygon budgets for karts/courses, but it refuses to work. Even if that, I'm not happy with many of the code I use actually. For example, the 3D GFX FIFO IRQ handling is a bit fat hack, the capture unit emulation is far from right, and my 2D pixel blitter implementations have to be rewritten with something like what I did with the official desmume, or maybe something a bit faster (if possible).

Third, the current debugger of desmume demotivates me every time I've to use it. It lacks breakpoint support, and some other small details that would make debugging games for hours easier and faster. For example, for homebrew development, I'd love original source code debugging, instead of the generated code. Not to mention that it uses plain Win32 GUI, so any addition or modification to the current is painful and time consuming.

So basically, I'd like to rewrite the 3D core to properly handle the GFX FIFO IRQ, write a Windows Forms GUI, with a new and enhanced debugger, and fix some misc stuff.

Anyway, the little work I devoted to desmume lately, was mainly focused on fixing some regression bugs I introduced while changing how the 3D core works, fixing one homebrew and one commercial game. Fixing the homebrew was easy, as it only failed due to some DS display list commands being unhandled, and the "list cleanup" taking too much time. After 45 minutes of debugging and profiling, I got it working at 60fps all the time.

Later, I wanted to fix Dead'n'furious, as it seemed to fail rendering 3D or stall while getting ingame. I really didn't knew, so I started to work, first to understand why it was failing. The first debugging sessions showed that it was in fact sending stuff to the 3D renderer, so it wasn't freezing, only not rendering onscreen.

I've a few switches that affect the 3D renderer on my build, to list: wireframe, disable lighting, disable blending, disable alpha test, disable texturing, and disable the whole 3D core rendering (in fact, it only disable the blit to BG0, but anyway, it's more or less the same for debugging purposes). None of them seemed to have any effect, so I debugged a bit more.

What I did next, was to check the primitive group start routine (I mean glBegin :P), as lots of setup is done there and it's usually a good start. There was the first pointer, it seems that the projection matrix was wrong. Specifically, the scale was wrong (abnormally big values), making that the primitives (triangles / quads) became squeezed/degenerated, resulting in primitives not showing up. Just as a fast test, I changed all the projection matrices to identity matrices so I would get something onscreen, if that was the only problem in that game. I expected so, as I had dumps of the textures, and they seemed ok, so if projection was the only problem, it would give me some results. In fact, it did.

After that, I just had to locate where and why those values were assigned to that matrix. That's what took the most. First, as the failing value wasn't the first to be assigned to that matrix, I just stopped execution when the first one was written, and debugged from there. I was lucky enough to see that the "failing" values written to the matrix, were calculated between the first write and the failing one. As I suspected from the beggining, it was indeed a CPU bug and not related to the 3D core. Basically some of the registers used on the matrix write were never updated: the projection calculation was done and stored in memory, but never retrieved from memory to registers, to be used later.