Subroutine 40c0 was analyzed first because it is the first time that the game waits on the Real3D status bit. This turns out to be some sort of a DEC calibration routine. Its caller has not been analyzed.

Next, the routines that initiate DMA transfers leading up to this routine were analyzed becuase they have a curious pattern of writing 0x88000000 *twice* after certain patterns. After writing the config space, it writes 0x88000000 *three times*.

The data written initially to Real3D RAM appears to be a minimalistic scene graph consisting of one single viewport and one single culling node that references an address in VROM which may or may not be a model. It has not yet been analyzed in detail.

//// This routine is executed once during startup. Calibrates the DEC value.// If the DEC value is too low (i.e., the status bit flipped too quickly),// the game will hang a little further on in suba0e8() in a decrementer// loop that loops until the decrementer turns positive (which happens in// the VBL handler when DEC is reloaded with this calibrated value).//void sub40c0(){ // Measure the duration of one whole frame wait_for_vbl(); uint32_t start_of_frame = read_tbl(); wait_for_vbl(); uint32_t end_of_frame = read_tbl(); uint32_t frame_duration = end_of_frame - start_of_frame;

// Compute the time that the flush and subsequent status bit flip took. This // is the only place in the code that this value is loaded, and it is used to // reload the decrementer. // // What puzzles me is why they don't just measure the time directly by // taking (start - end_of_frame) ? _dec_reload_on_vbl = frame_duration - duration;}

// Set up the DMA list used by the IRQ handler. The first 3 entries are the // transfer we make here. The IRQ handler will advance past these and see // the end-of-list sentinel and then do nothing. _5812f0[0] = bswap32(flags_and_num_bytes); _5812f0[1] = bswap32(src); _5812f0[2] = bswap32(dest_addr); _5812f0[3] = bswap32(0x98080000); // end of list _5812f0[4] = bswap32(1); g_dma_list = &_5812f0[0];

// This appears to indicate that there is data in the DMA list. The IRQ // handler will clear this when it reaches the end-of-list sentinel. _dma_xfers_pending = 0xff;

In-game, when there are no textures being (including during the parts of attract mode where no textures are uploaded), this subroutine is called before DMA copies to 98xxxxxx. Then, at 3907c, a function is called to copy data to 8Exxxxxx and write 88000000.

When textures are uploaded via the VROM port during attract mode by writing 90000000, a flush (88000000) is triggered after each write. And a068 is called before each of the VROM texture port writes. Therefore, this function appears to be some sort of sync-before-Real3D-access.

2. Subroutine A0E8.

This apparently waits for the next VBL but needs to be scrutinized more carefully.

uint8_t _580e32;uint32_t _580e70;

void suba068(){ if (_580e32 != 0) return;

uint32_t start_time = read_tbl();

// Value of DEC after 3300 cycles have elapsed (i.e., 3300 cycles since VBL // started). This value may not be coincidental. According to Charles' // System 24 doc, if the tile generator is operating at 424 lines per frame, // the breakdown is: // // 25 scanlines from /VSYNC high to /BLANK high (top border) // 384 scanlines from /BLANK high to /BLANK low (active display) // 11 scanlines from /BLANK low to /VSYNC low (bottom border) // 4 scanlines from /VSYNC low to /VSYNC high (vertical sync. pulse) // // On System 24, the interrupt happens on the last line and is "asserted on // the negative edge of H-sync before blanking is disabled and held for one // scanline (656 pixels) such that it is negated on the negative edge of H- // sync of the next scanline, line 384." // // Given: // // - Assume 66 MHz bus frequency // - TBR and DEC registers tick once every 4 bus cycles // - Assume display rate of 57.52 Hz // - Assume 424 scanlines per frame // // Then the number of timer ticks per scanline would be: // // ((66e6 / 4) / 57.52) / 424 = 676.5 // // The value 3300 corresponds to 4.8 scanlines. If the IRQ is triggered on // /VSYNC = 1 -> 0, they could be waiting out the vsync pulse and waiting for // the *start* of the next active frame. //

uint32_t value_after_delay = _dec_reload_on_vbl - 3300;

// Wait for VBL if decrementer is negative, which indicates that the allotted // time to do things after VBL is triggered (calibrated at start-up) has // already passed if (read_dec() <= 0) { uint8_t old_value = _vbl_count; while (_vbl_count == old_value) ; }

// This seems to make sure at least 3300 cycles have elapsed since VBL while (read_dec() > value_after_delay) ;

// How much time we wasted in this bullshit subroutine ;) _580e70 = read_tbl() - start_time;

// Reset VBL count _vbl_count = 0;

// Disables this routine the next time it is called. This is reset in // suba0e8() at the end of the main application loop. _580e32 = 1;}

// Oddly, 98080000, which looks like an address, is used as an end-of-list // sentinel and is stored in the length parameter uint32_t num_bytes = bswap32(g_dma_list[0]); if (num_bytes == 0x98080000) { _dma_xfers_pending = 0; return; }

Random thoughts ..All the time I've spent working on the graphics emulation. One thing is fairly clear. The GPU on the model3, is almost 100% identical to the actual real3d pro-1000. Even the revisions seem to match up ie the step 1.0 hw without the extra culling node information. The real3d pro-1000 had no tilegen. So what would be driving the equivalent of IRQ2 on the real3d pro-1000? There is no end frame call on the real3d pro-1000. It just flushes the database, which updates all the culling memory etc, same as the model3.

Back to the tilegen. This chip basically originates from the system24? I don't have a great deal of knowledge on how these old systems work but am I correct in thinking there is no traditional framebuffer? I guess on those very early systems frame buffer would be extra memory which was extremely expensive back then. Didn't they work by sort of decompressing the sprites at the time the scanline on the screen was being updated? So correct synchronization was critical. The point is, how would this setup even work with a traditional frame buffer? You'd need a RGBA frame buffer (instead of RGB), where the alpha would have to blend what was in the frame buffer with the layers from the tilegen, since not every pixel is always written to. You need this to work with the alpha layers as well.

If i was making this system I'd make the tilegen draw directly into the frame buffer. So to start a frame you'd do this

I don't even own a model3 board .. But it should be possible to confirm by looking at it. What is driving the monitor output? A chip directly connected to the tilegen? Or is it connected to one of the real3d chips? Sometimes these older systems had external ramdac (ram to digital to analogue) converter chips, but most of the time they are built into one of the main chips.

The TLDR is, maybe IRQ2 is not being driven by the tilegen at all. Maybe it's being driven by the real3d chips when they have finished rendering?? Or when they are available for rendering again? That might explain why no single timing value works, and why games (such as ocean hunter) fall apart in the middle, because memory updates start straddling frames.

I think I have a feeling I finally know what the status bit isFrom testing ..

I think it controls where in the frame it's safe to write. And what I mean by that is if I set the status bit time to 50%. In vf3 if any writes happen over 50% of the frame time. The game asks the status bit to see if it's changed to a 1. If it's gone over the time, it drops the next frame. If the writes followed by a flush happen before the status time, no frame drop happens.

I am not sure all games do this, as the real3d has a few modes, one of which always runs at 60fps, the other drops back to 30fps. I forget what it called those modes.