This project began some years ago, about 2011, when I was interested on implementing SDD1 chip on an FPGA using Andreas Naive documentation about the chip. I first create the core files which decoded 2BPP and 4BPP modes, 8BPP was buggy bt then and the last mode was not implemented. I tries to contact with some guys from SD2SNES to ask if they were interested on the project but never got an answer and I gave up the poject due to lack of interested and come personal issues.

But finally, some months ago I finished it all, checked that all modes worked perfectly and decided to implement the chip on a Zedboard. My final goal is to connect the board to SNES using a custom interface board and check it on the real hardware.

I'm interested, how many master cycles does it take to decode first two bytes (one row) and the rest in your implementation?

Well, it depends on the bit-depth and the context. For 2BPP and context 3, it tooks 31 master cycle from wirte to $4801 to first 2 bytes are ready on the output FIFO, plus 1 cycle to get first byte on the DMA data bus. It takes 15 cycles to fill the pipeline, 16 master cycles to complete each pixel on 2 bitplanes and 1 cycle in reading from output FIFO.

My Decoder takes 16 master cycles for 16 bit of 2 bitplanes (plus 5 cycles for fist 2 bitplanes for read header and initialize decoder). When I created DMA module and SDD1 I was based on this post. I think, decoder should start after write to $420B and fetch next opcode (from line 2490 in post). But then remains few cycles to initialize decoder and decode the first 2 bitplanes. Also we must not forget about pause for DRAM REFRESH.Maybe SDD1 chip decode 2 bitplanes over 8 master cycles.

That sounds fast! Do you use input FIFO for decoding? Andreas Naive wondered if it was necessary, and looks it is for high decodin ratios (up to 128:1).

srg320 wrote:

I think, decoder should start after write to $420B and fetch next opcode (from line 2490 in post). But then remains few cycles to initialize decoder and decode the first 2 bitplanes.

I'm pretty sure decoding starts after writing to $4801; I take that point as 0t (start of time) to measure master cycles. If I measure time from writting to $420B, my decoder is as fast as 4 master cycles, but that's tricky, because after triggering DMA (ie, after writting to $420B) data must be present on data bus at most 7 cycles later so CPU latches data in the 8th cycle.

srg320 wrote:

Also we must not forget about pause for DRAM REFRESH

DRAM refresh doesn't affect at all to S-DD1, in fact, that cartrdige connector pin is not even routed to the chip (neither in Star Ocea nor in SFA2). When DRAM occurs, CPU is halted, so /RD and /WR strobes are drive to '1'. S-DD1 must obey to those strobes to present decoded data to the DMA.

srg320 wrote:

Maybe SDD1 chip decode 2 bitplanes over 8 master cycles.

I thought a lot about this; at first, I also thought maybe the chip had some kind of parallelism, but if you check carefully the decoding algorithm, context information is only available after decoding current pixel, so it's impossible to decode (n-1)th bit if nth bit is not decoded first to create the context. Then, if S-DD1 is designed fully synchronous, 1 pixel is decoded each clock cycle. If S-DD1 is partially synchronous, there is high risk of combinational loops, so latches must be instantiated.

That sounds fast! Do you use input FIFO for decoding? Andreas Naive wondered if it was necessary, and looks it is for high decodin ratios (up to 128:1).

I mean 5 cycles for init decoder and header + 16 cycles for 2 bitplanes = 21.I am not use FIFO, I wait end of DMA RD signal (every second) and run decoding new 2 bitplanes.

magno wrote:

I'm pretty sure decoding starts after writing to $4801

Ok, what will happen if 43x2-43x4 is writed after writing 4801, or selected more than one channel SDD1. And how SCPU will fetch next opcode after writing to 4801, if SDD1 will use ROM bus? Or, SDD1 reading ROM between SCPU access when RD and WR is high level?

I am not use FIFO, I wait end of DMA RD signal (every second) and run decoding new 2 bitplanes.

How would you fetch data from ROM if for each output pixel you needed to decode a 7-order Golomb code? You'd need 1 input byte each output pixel (ie, each master cycle), to achieve that your ROM should be 47ns time access or better. What if you must mantain this input rate because context is switching with each output pixel?

srg320 wrote:

magno wrote:

I'm pretty sure decoding starts after writing to $4801

Ok, what will happen if 43x2-43x4 is writed after writing 4801, or selected more than one channel SDD1. And how SCPU will fetch next opcode after writing to 4801, if SDD1 will use ROM bus? Or, SDD1 reading ROM between SCPU access when RD and WR is high level?

These are good questions, I should check in the real hardware, but I haven't had free time to mount the components on my interface board (between zedboard and SNES). My guess is:

srg320 wrote:

what will happen if 43x2-43x4 is writed after writing 4801

you shouldn't do that, in fact, neither SO nor SFA2 do that. But if you did, S-DD1 would start decoding from the last source address it had sniffed from SNES dara bus, I guess.

srg320 wrote:

selected more than one channel SDD1

that's not a problem, you select which DMA channel to sniff writting to $4800 and which channel to decode writting to $4801. If you trigger a decompression from a different channel you sniffed, nothing happens, ie, DMA is filled with the same byte on each beat. I checked this on emulators, so maybe is not accurate.

srg320 wrote:

how SCPU will fetch next opcode after writing to 4801, if SDD1 will use ROM bus?

SCPU is much slower than master cycle (6 or 8 cycles down), so it is easy to time-multiplex acces from SCPU and S-DD1 decompression core. But you need an input FIFO for data which will feed the decompression core.

srg320 wrote:

SDD1 reading ROM between SCPU access when RD and WR is high level?

That can happen only during DMA: DMA engine stalls the SNES CPU while DMA is in progress; the CPU resumes after all bytes are transferred. In any other cases, S-DD1 decompression core doesn't need to access ROM if no decompression is running. The only situation when collision occurs is after writting to $4801 and writting to $420B, because both decompression core and SCPU need data.Star Ocean has some padding instructions between them (PLA - PHA) for delaying start of DMA so SDD1 has time enough to read first words from ROM and begin decoding.

you shouldn't do that, in fact, neither SO nor SFA2 do that. But if you did, S-DD1 would start decoding from the last source address it had sniffed from SNES dara bus, I guess.

srg320 wrote:

selected more than one channel SDD1

that's not a problem, you select which DMA channel to sniff writting to $4800 and which channel to decode writting to $4801. If you trigger a decompression from a different channel you sniffed, nothing happens, ie, DMA is filled with the same byte on each beat. I checked this on emulators, so maybe is not accurate.

srg320 wrote:

how SCPU will fetch next opcode after writing to 4801, if SDD1 will use ROM bus?

SCPU is much slower than master cycle (6 or 8 cycles down), so it is easy to time-multiplex acces from SCPU and S-DD1 decompression core. But you need an input FIFO for data which will feed the decompression core.

The only situation when collision occurs is after writting to $4801 and writting to $420B, because both decompression core and SCPU need data.Star Ocean has some padding instructions between them (PLA - PHA) for delaying start of DMA so SDD1 has time enough to read first words from ROM and begin decoding.

Forgive me for asking, but why not tackle the SFA2 decompression similar to what was done with Star Ocean? Then a "standard" cart could be used? Or am I missing the point?

Well, it's more exciting to replicating the chip than decompressing the graphics XD It could be done, of course, but the task is more tedious and less creative. Moreover, implementing the chip could e useful for the scene to create hacks that uses it.

Moreover, implementing the chip could e useful for the scene to create hacks that uses it.

At least two large potential S-DD1 projects have been talked about here, although they are just speculation at this point (and although they aren't hacks, they do have copyright issues):

- a port of Metal Slug, as close to arcade-perfect as possible with period hardware- a re-port of Street Fighter Alpha 2, employing advanced techniques to fix the music and sound, loading pauses, and cut-down graphics and animation

Both of these projects should fit comfortably in the S-DD1's available ROM space (which I believe is 16 MB addressable plus 3.875 MB in parallel) with the graphics compressed. Neither one is especially likely to fit in an ordinary cartridge, particularly since software decompression would take too much S-CPU power to be feasible, and the SA-1 imposes an 8 MB limit. In both cases, using the MSU1 would defeat the purpose of the project.

It could be argued that neither of these projects is likely to happen, but it's nice to know that they could.

- a re-port of Street Fighter Alpha 2, employing advanced techniques to fix the music and sound, loading pauses, and cut-down graphics and animation

So the game has issues already.... humph, wasn't aware of that.

Quote:

Well, it's more exciting to replicating the chip than decompressing the graphics XD It could be done, of course, but the task is more tedious and less creative. Moreover, implementing the chip could e useful for the scene to create hacks that uses it.

But wouldn't that require a few options --- either A, someone would need to buy the FPGA SDD1 pcb (presumably a dev board), or B, wouldn't this lead to more cart destruction because some potential future game/hacks that needs the SDD1. Can the SD2Snes can run the SDD1 games?

The Star Ocean, since it's been decompressed, can play on a OEM style cart. https://youtu.be/_c2OoGkPA4o (video has no sound)Truthfully, I'd like to see it decompressed, but I do understand the drive to replicate the sdd1 chip also. If I understood how FPGA's worked, I'd probably do the same thing.

It's just that I and others feel that it was probably possible to do better. Maybe it would have been unreasonable to expect better under reasonable time and budget constraints, or for an affordable price. Maybe it was a corporate afterthought or a contractual obligation and didn't get a reasonable schedule or budget. Maybe the RAM-limited nature of the next-gen consoles made devs wary of pushing the SNES too hard for fear of making the PlayStation look bad. Maybe the programmers were just lazy or incompetent. Or maybe it really is as good as it can get on the hardware - but I doubt it.

The graphics seem to be smaller than they need to be, the screen is letterboxed, and the animations are missing frames. Preliminary calculations suggest that it may be possible to remedy all of these things with a sufficiently advanced animation engine and more ROM.

The vocals are muddy, the music is terrible, and the game has loading pauses where everything freezes. These things are intimately related, and I think it's possible to fix all of them at once with a high-bandwidth HDMA streaming scheme. Even if I'm wrong, there are multiple examples of games that handle on-the-fly ARAM loading better than this one.

The game also has some slowdown, and I don't really see why it should.

Who is online

Users browsing this forum: No registered users and 10 guests

You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot post attachments in this forum