OK so I ended up in a discussion about the SuperFX and decided it'd be better to just see what the games actually did that impacted their framerate (SuperFX games aren't exactly fast for the most part...). So here's the question: does anybody know what kind of calculations were used in those games? (I mean actually used, not just speculation because I already know of many algorithms that were common back in the day) What's usually the bottleneck? Because it'd be nice to know.

No need to post code here, just discussing what approaches they took is fine.

Given that the GSU (aka SuperFX) is only a specialized DSP without hardware support for rasterization, the bottleneck must in most cases have been triangle filling. Then of course, to minimize the amount of filling needed a lot of cycles was probably instead spent on culling and sorting triangles before actually commiting to rasterize one. Vertex transformation isn't exactly cheap either, but compared to other machines of the era (including mc68k) it was quite a beast at multiplications.

Actually, it does have a dedicated PLOT instruction specifically meant for rasterization (it places a pixel at the given coordinates and then advances the coordinates to the next pixel). It only covers the filling part though, not the part where it gives shape to the triangle, but it's something.

It's worth noting that from what I've been seeing it's extremely slow at accessing memory (either ROM or RAM), those instructions eat up several more cycles than the rest, so all code must run from the instruction cache if you don't want to slow down like crazy (and that cache is only 512 bytes long). I imagine that avoiding memory accesses is probably a priority.

Also I noticed that while 8-bit multiplication is actually extremely fast, 16-bit multiplication is even slower than memory accesses =/ So if vertex computation relies a lot on 16-bit multiplication, yeah, it can be slow too (indeed, not as slow as the CPUs of that era, but still lots of cycles are lost compared to the rest of the code).

May as well get out of the way the stuff that brought this discussion in the first place (tell him to take his time, this is a lot):

1) What algorithms are used to process the vertices? Both transformation and projection.

2) What algorithm is used to raster (render) the triangles? Related, is there any special calculation to discard backfacing triangles or did it just rely on the algorithm failing early when the wrong culling is used?

3) What were the biggest bottlenecks when programming the SuperFX? (I'm aware that accessing memory is horribly slow, but I'd like to know what were the most serious problems that actually arised during the programming)

4) Not directly related to Star Fox but yes to the SuperFX, is it possible to render to sprites? (as in having the entire screen covered with sprites) I did find info about PLOT being able to render to sprites but I have no idea on what limits there's to that.

psycopathicteen wrote:

How does the FX compare against the DMA at filling pixels?

Considering the SuperFX can render during active scan while DMA can't (and that the PLOT instruction can do dithering and automatically handle pixel positioning)... I'd say it wouldn't even be a fair comparison =P

1) Was there any sort of SuperFX processor simulator that ran on the development PC that could be used for testing snippets of code, or did the code constantly have to get burnt to UV erasable eproms or something else like a rom emulator?

2) Did/Does the SuperFX CPU itself have any sort of debugging features (single step, register dump) that can be triggered? If this is too specific, was debugging done: 1) with an LED, 2) writing to some port or memory location 3) using a full SNES dev system and just printing to the screen. 4) just simulated on the development PC as related to the question before.

3) Although I realize you were working on the software, but do you recall any discussions about why the SuperFX boards had to start using a dedicated clock resonator circuit instead of the 21 MHz signal from the cartridge edge? My opinion is that it was too unreliable due to signal strength and general contact corrosion, but I'd rather ask it from the source. Or was it a software performance reason (the 21 MHz gets gated off during dram refresh I think).

Actually, it does have a dedicated PLOT instruction specifically meant for rasterization (it places a pixel at the given coordinates and then advances the coordinates to the next pixel). It only covers the filling part though, not the part where it gives shape to the triangle, but it's something.

It's worth noting that from what I've been seeing it's extremely slow at accessing memory (either ROM or RAM), those instructions eat up several more cycles than the rest, so all code must run from the instruction cache if you don't want to slow down like crazy (and that cache is only 512 bytes long). I imagine that avoiding memory accesses is probably a priority.

Also I noticed that while 8-bit multiplication is actually extremely fast, 16-bit multiplication is even slower than memory accesses =/ So if vertex computation relies a lot on 16-bit multiplication, yeah, it can be slow too (indeed, not as slow as the CPUs of that era, but still lots of cycles are lost compared to the rest of the code).

Sure, there's the PLOT instruction and related settings registers. With a dedicated rasterizer I mean something more like "here's a memory offset to 3 or more screen space coordinates, fill the area for me", or at least something like blitter filling on the Amiga.

I wonder if someone's alluding to the difference between the programmable vertex shader (RSP) and the fixed-function rasterizer (RDP) in the Reality Coprocessor of the Nintendo 64 console. The Super FX, as I understand it, has to act as both the vertex shader and the rasterizer.

That's exactly the role the SuperFX is meant to fill! The 65816 gives it the data and tells it "render this for me". You could look at it like a programmable "blitter" =P

Speaking of, if you don't mind I'd also like to ask a question, namely how they handled interoperability between the scpu and gsu. How exactly did they split the tasks between the two processors and which one did what? What did the scpu do other than read pads, poke commands into the spc700, update the game engine, ppu, hdma registers and dma framebuffers into vram? Or, other than that, was it just idly waiting for the stop flag in $3030 the rest of the time?

Something like 1987: r-e the NES1988: Think about custom co-processing mapper for NES19xx: Demo 3-D game to Nintendo19xx: Nintendo shows us SFC for the first time; move coding over to that....199x: Develop portable game system HW for Nintendo, they never use it.

Who is online

Users browsing this forum: No registered users and 5 guests

You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot post attachments in this forum