If you look through a game’s routines long enough you’re bound to find something wrong. Maybe a register was clobbered. Maybe something was written twice when it didn’t need to be. Maybe something goes horribly wrong and just happens to work out, disaster narrowly avoided. It’s likely that your average player (and/or your average hacker) might never see these goofs, so I figured it’s time to have a thread for them.

Post whatever bugs, compiler goofs, etc. here. If anything this thread will keep me from pasting random snippets of ASM into the discord server, and hopefully this thread’ll be a good read.

Let’s get into it. I like to play around with FE5, so these examples will be 65816 assembly. I’d like to explain each of these so that someone with little to no ASM experience could hopefully follow along.

#Block Transfer To/From Anywhere

65816 has two block memory transfer opcodes, MVP and MVN, which are similar to THUMB’s ldmia+stmia for transferring data. The opcodes include the bank (the upper 8 bits of a pointer on the 65816) for both the source and the destination, with the number of bytes to transfer and the lower 16 bits of the source and destination in CPU registers. This poses a bit of a problem, as you can only copy data to/from locations known at compile time (you need to know the banks to write the opcode). To overcome this, FE5 has routines that build a block transfer routine in RAM. When you need to copy data, you fill out the banks of the opcode in RAM and hop to the routine. It’s quite clever in my opinion.

The first byte of the MVN/MVP opcodes are written to RAM on startup, along with another opcode to return from the routine:

If you don’t know 65816, this might be mumbo jumbo to you, so let’s break it down. We’re copying two routines, mvn_routine and mvp_routine, to RAM addresses $0004AE and $0004B2 respectively. We copy them end first, using a loop counter in the X register. We need this counter to be one byte less than the size of the routine because we’re looping with a BPL opcode (0 is considered positive). After each byte, we decrement the loop counter.

Here’s the issue: when copying the MVP routine, the size wasn’t reduced by one, so the first byte of the next routine (a phb opcode) gets copied into RAM at $0004B6, overwriting whatever was there accidentally. Man, that’s a huge explanation for such a tiny thing, right? So, what was originally at $0004B6? $0004B6 is used exactly once when setting up the sound system, and probably didn’t even need to be used. Lucky us, nothing of value was lost. Even better, the only known routines that use this block memory copier look like this:

There’s some interesting other things to consider: The way the routine user loads the parts of the routine as literals and writes them to fixed points in RAM is faster than the startup routine. The startup routine would probably be faster if it actually used MVN/MVP to copy the MVN/MVP routines. And, finally, none of these seem to be called.

That “unreachable” bit at 249CC gets jumped to as a loop break. It’s shoved in right after the ballista check and just sets the return value to true and jumps right back down to the end of the function. I initially thought it was completely unreachable so it looks stupider than it is, but still, branching to write one byte and then branching back…

The SNES has a 16-bit processor with an interesting property: Software can decide whether the CPU’s three general-purpose registers are 8 or 16 bit. It can change these sizes on the fly through the use of the rep and sep opcodes. Much like the stack, the state of how large the registers are must be restored after actions that change them.

There’s a routine in FE5 that forgets this rule and manages to avoid crashing the game, if only by chance.

I’ve trimmed out the parts of this snippet that aren’t needed to demonstrate what’s happening. On one path this routine can take, it encounters a sep #$20 opcode which sets the accumulator, A, to be 8 bits. It continues executing through _A5B6 as intended. Now, the fun part of being able to change your register sizes is that certain opcodes, such as ones that load literals, also change size to match the register size. Under normal operation, the lda #$03 here loads the byte-sized value $03 into A. Without setting the accumulator to the right size, the other route the routine can take will encounter much different code. Here’s a snippet of what the code looks like from that route:

Bad Intsys

_A5B6
plx
lda #$8503
sep #$A9
sbc $06BD9D,x
plb
plp
rtl

Luckily all of the pops (plx, plb, plp) are all still there, so there aren’t any stack issues, and it returns fine. The end result is that the delay between button reads when exiting an item selection menu is slightly different if the unit has a weapon.

The plp opcode at the end pops the processor’s state back to what it was at the start of the routine, so it returns with the right sizes.