Writing a good compiler to target the 6502 is difficult, because the CPU is so unlike modern hardware (8-bit with 16-bit addressing, only one register that isn't crippled, zero page, no arbitrary shift amounts, no multiply, weird addressing modes, etc. etc.) I'm always impressed when anyone gets something working :)

Sure, you can use the zero page(the first 256 bytes of the memory) to act like 128 16-bit registers, but you know, it feels like a cycle waste most of the time. Self-modificating code is easier most of the time, but not reentrant.

When thinking back to my C64 days at the sheer number of loops that I did with self-modifying code [1], I shudder to think what a nightmare it would've been if larger systems had been written with it... But it as oh so pleasant to do small stuff on the 6502 series...

[1] For those who don't have any exposure to 6502 assembler, here's a basic subroutine to copy blocks of memory - after ~25 years of not touching it so bear with me (and corrections welcome) especially with respect to syntax.

Lets say we have the high byte of a 16bit address we want to copy from in the A register, and the high byte of a 16bit address we want to copy to in the X register (often it wouldn't be unusual for the calling code to directly modify the src/dest addresses instead to set up the loop...), and for simplicity the low byte of both is 0, and the number of 256 byte chunks we want to copy in Y.

I'm sure I've forgotten some more efficient way of doing this - it was all cycle-counting all the time, and lots of ugly (uglier) tricks, but the basic mechanism of directly modifying the high bites of addresses all over the place to deal with loops beyond 256 iterations was pretty common.

D'oh... On looking at this again today I immediately spotted one pointless thing: "outerloop". Of course you don't need to jump there to set X to 0, because if you fall through the "BNE $innerloop", it's because X has wrapped to 0. Wasted cycles. The LDX #$0 must stay unless you know X is 0 on entering the routine, but the BNE $outerloop should be BNE $innerloop (and so you might as well rename it to just loop).

I did some self modifying code just 10 years back on an firmware for a chip with an 8-bit CPU and a custom sequencer for data transfers. The sequencer was flexible but somewhat inefficient so it was incurring 30% overhead on some opcodes. A lot of the times when the sequencer was doing its job, the CPU would just poll for the sequencer completion. So what I did is split the sequencer code and moved some of the work to the CPU. To synchronize the two programs, I had the sequencer do its work and then jump on a register. Initially the register was set to jump back to PC causing it to pause. The CPU would do its bit then set the register to cause the sequencer to jump to then bit of microcode to execute. So the sequencer had efficient little subroutines while the CPU would make the decision to switch between them. End result the whole thing ran 30% faster and I was able to convert a bunch of assembly code into C which was easy to read. After that I spent my time pushing the chip designer to make a proper sequencer and CPU design.

As long as you properly comment your code (and explain the magic of $src+2/$dst+2), you'll be OK.

I've written a "window server" for the Apple II (7-pixel aligned, window stack you could only add and remove rectangles from the top) with about 1K of code. It was not that difficult to reason about, but I had a much younger brain at the time.

Yeah, "nobody" commented those things in many circles. You were expected to know the common self-modification idioms.

But on top of that, a lot of software on the C64 at least (I'd like to think this didn't apply to any "professional" software, but I suspect you'd find a lot of hairy stuff there too) was not written with a macro assembler but directly into a machine code monitor with no ability to store labels or comments.

In many cases you'd be lucky to have a sheet of handwritten notes about which addresses contained which code (and yes, that meant re-writing bits and pieces of code if you wanted to insert something new).

For my part I used a machine code monitor without labels etc. for several years before I got a proper macro assembler (Turbo Assembler) with a full screen editor.

Very cool. I'm a lisp programmer now, but this reminded me that as a teenager I was learning how to do games programming on my C64 using an incredible Forth compiler and games framework called White Lightening.

There was also a BASIC version of Lightning. I never had the Forth version, but played around with the BASIC variant.

BASIC implementations of everything in that Youtube video came with it, and it had funky things like pre-emptively multitasking for BASIC programs (fairly simple - just use an interrupt or hook into one of the BASIC entrypoints and count time slices and shift some pointers around) + separate programmable sprite animations (so you e.g. could start sliding a sprite over the screen and let your BASIC program keep doing other stuff