Tired of the same old memory map? Recently I was thinking that in theory, it should be possible for an NES cart to handle separate pages for code and data, like the W65C816 (but only for the cart mapped area, obviously). If you're not familiar with it, this is similar to what you'd use instead of bankswitching on the SNES, when using 16-bit addressing.

It's not simple, but I think it could be done in a small $5-$6 FPGA. It would basically be emulating the 6502, but as little of it as possible, based on observations of what the CPU is doing. I'm not concerned if it's practical, it's just been a fun puzzle for me to think about and get some more Verilog HDL practice, maybe I'll post it when it's more complete but I won't be able to test it anytime soon. If nothing else, I figured I'd share the idea, for your entertainment.

You could get away with much of it based the data bus alone, until you start branching, indexing across page boundaries, and using interrupts. For interrupts, you need to watch the full address bus, no way around that. Branching I believe can be done by also watching the address bus to detect the program counter location (and to know if we branched or not). One situation is when a branch instruction simply branches to the following opcode (which isn't as useless as it sounds, it's invaluable when you need a fractional delay in a cycle-timed loop), but I believe that can be handled by a special case for the opcode fetch following a branch instruction, if it's reads the following opcode address twice, then we're in the extra cycle of a branch to nowhere. For page crossing, you have to emulate the X and Y registers.

So it's basically it involves watching the data bus to detect the opcode, then we enter a state machine based on the type of opcode (pretty much the address mode type). CPU accesses memory for each instruction cycle, and every memory access will be 1 of 4 types: Data, Program Counter, Program Counter + increment, and opcode fetch (a special case of PC+increment which also resets the state machine). So the end result is it automatically selects one page on the Data cycles, and another on the Program Counter cycles.

Yeah, that's what I meant when I said I'm not concerned if it's practical, it's complex but really doesn't allow anything new. It certainly could be MMC3-flavoured though, by having separate code and data banks per each region of memory.

The part I've so far been avoiding in this setup is DPCM. I think that could also work (and allow for a separate DPCM page), but it's not something I've looked into.

Nice, I like that idea. I remember lidnariq posted a similar concept, but narrowing down into a 1-byte looped sample is really cool. So the mapper would have it's own DPCM address/length registers, which means it could address all the memory on cart, and maybe even better, could support single-byte loop boundaries (vs the usual 16 bytes).

In this type of fully address-controlled setup too, another thing can be done is allowing multiple interrupt vectors. Convenient, if the mapper has multiple IRQ sources (maybe one reason would be to stop the DPCM channel, heheh).

At some point I persuaded myself that the "right" way to deal with DPCM streaming was to map INL's deserialized SPI ROM output idea over some fixed 4k window. You'd still have to dance between DPCM reads and normal level loading. But there's no reason to not use all the address lines you have connected to the CPU's bus, and switching it down from 4k to however much smaller you could justify would ease the sting of losing more of a fixed bank.

I think the sprite DMA might be reusable, for writing to the mapper and cart memory. Because it corrupts the OAM, it would have be done during vblank or with sprites off. That would be a nice way to update video memory though, for CHR that would be 16 tiles in 513 CPU cycles. On NTSC, if the screen was turned off just a couple lines early, there would be time for 5 DMAs, you could write 1kB of PPU data and update the OAM every vblank.

4-bit DPCM and any audio stuff can be done from the 2A03's perspective as an interrupt occurring at the sample rate, I've done a successful experiment like that using a PIC18 MCU. A while back tepples came up with a brilliant minimal audio interrupt routine, simply INC $4011 / RTI. An INC can read the audio from the mapper, write it to the NES DAC, and acknowledge the interrupt all in one instruction (it means you can't use sample value $7F though, with the 7-bit DAC it'll overflow). That's is low-overhead as it gets, good enough to be usable in a game, I think.

DMA functionality is definitely interesting to me, and something I've been planning out how to use in an ideal setup. If you look at an FPGA, even at the low end like a MachXO2 for $6, includes 8kB of embedded RAM. One could do some pretty good stuff with that.

it means you can't use sample value $7F though, with the 7-bit DAC it'll overflow

I think I remember it being pointed out that it just adjusts the encoded range. e.g. if the read-only register there contains values from 1 to 128, then DEC $4011 would DTRT; or -1 to 126 with INC $4011. Using LSR $4011 even automatically converts unsigned 8-bit numbers.

Quote:

the low end like a MachXO2 for $6, includes 8kB of embedded RAM.

Their iCE40 series are a little cheaper and the next-larger-than-smallest also includes 4 or 8 KiB of block RAM. ... huh, now I'm not certain exactly what the differences are between the MachXO2 and iCE40.

4-bit DPCM and any audio stuff can be done from the 2A03's perspective as an interrupt occurring at the sample rate, I've done a successful experiment like that using a PIC18 MCU. A while back tepples came up with a brilliant minimal audio interrupt routine, simply INC $4011 / RTI. An INC can read the audio from the mapper, write it to the NES DAC, and acknowledge the interrupt all in one instruction (it means you can't use sample value $7F though, with the 7-bit DAC it'll overflow).

I refined it to dec $4011, which means you can't use $01. But in the topic CPLD square wave synthesizer, we determined that the bigger problem was that an IRQ during OAM DMA won't get seen until several samples later.

Their iCE40 series are a little cheaper and the next-larger-than-smallest also includes 4 or 8 KiB of block RAM. ... huh, now I'm not certain exactly what the differences are between the MachXO2 and iCE40.

The ICE40 Ultra family has max 26 I/O pins of the parts available at DigiKey, compared to MachXO2 (up to 108)..

The attached file is a simulation of such an instance: it's a 110Hz sine (or square) wave played at 6991 Hz (1789773÷256). I assume that DMA will block two updates every vblank (6991 Hz÷60 Hz≈i.e. every 116 samples), but the hardware automatically drops samples if necessary.

There are audible clicks in both the sine wave and square wave, but whether that matters ... I'm not qualified to judge.

4-bit DPCM and any audio stuff can be done from the 2A03's perspective as an interrupt occurring at the sample rate, I've done a successful experiment like that using a PIC18 MCU. A while back tepples came up with a brilliant minimal audio interrupt routine, simply INC $4011 / RTI. An INC can read the audio from the mapper, write it to the NES DAC, and acknowledge the interrupt all in one instruction (it means you can't use sample value $7F though, with the 7-bit DAC it'll overflow).

I refined it to dec $4011, which means you can't use $01. But in the topic CPLD square wave synthesizer, we determined that the bigger problem was that an IRQ during OAM DMA won't get seen until several samples later.

It's a shame I left my old Squeedo project in such a sad state (I didn't start using version control until much later), because I had tested that a little bit. From what I remember, I had an audio IRQ coming in at around 30khz, and when I used OAM DMA I didn't seem to hear a difference. I kinda expected a 60hz buzz or something, but nope. My samples were all square waves, maybe that combined with the tone frequencies being too low covered up the problem (doesn't matter how high the output rate is, if you're resampling it to 8khz or whatever). This was with any missed samples simply being dropped. With OAM DMA enabled, it's effectively giving you a lower sample rate 1% of the time. I'd bet it would sound better than digitized audio did on the Genesis, heheh.

Yeah with Lattice parts I've looked extensively into the MachXO2 and XP2, and not at all into the others. Those 2 have some overlap as well, the important difference being that XP2 has multipliers and DSP blocks. The (superficial) impression I got from ICE series is that it's geared towards low current consumption, if there are any trade-offs, it's possible that it won't matter for use with NES though.

Who is online

Users browsing this forum: Google Adsense [Bot] and 3 guests

You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot post attachments in this forum