The SNES only has 8x8 multiplication (the other unit being used by the mode 7, that is not safe for a regular alternative)

Wait, are you saying that you can do 16x8 or 16x16 multiplication with mode 7's multiplication and division registers? Is there any realistic way to "chain" two 8 bit multiplications together like you can do with addition or subtraction? Probably not...

Stef wrote:

(you have to count at least 15/16 cycles for load / read result time)

That's why I was thinking it wouldn't be that fast. 70 to 140 cycles sounds astronomically high to me, but on the SNES, it's more like half which is still ridiculously large, but it's at least somewhat reasonable.

The SNES only has 8x8 multiplication (the other unit being used by the mode 7, that is not safe for a regular alternative)

Wait, are you saying that you can do 16x8 or 16x16 multiplication with mode 7's multiplication and division registers? Is there any realistic way to "chain" two 8 bit multiplications together like you can do with addition or subtraction? Probably not...

Not sure if this was what you were asking, but off the top of my head: (a and b are 16-bit numbers)

This could be repurposed for '816. The routine has a large overhead of T1 loading because it's setup for sequential calls were T1 doesn't change, but you could easily switch that to indirect, removing the self-modifying code, and seriously cut that loading over head down by a lot. Also, the tables are split 8bit as well as the math, so that could be optimized into single tables with single sbc's; word wide tables and 16bit operations. A re-write/re-structure for '816 could probably get it close to 70 cycle range.

Wait, are you saying that you can do 16x8 or 16x16 multiplication with mode 7's multiplication and division registers? Is there any realistic way to "chain" two 8 bit multiplications together like you can do with addition or subtraction? Probably not...

You can do 8x16 --> 24 multiplication with mode 7 registers but no division as far i know.And of course you can chain them to obtain a 16x16 --> 32 multiplication :(8 low) x16 = x(8 high) x16 = ythen just do a 32 bits addition : x + (y << 8) to get the final 32 bits results.Still even using the mode 7 registers i bet you spent a large amount of cycle just to do 16x16=32

Quote:

That's why I was thinking it wouldn't be that fast. 70 to 140 cycles sounds astronomically high to me, but on the SNES, it's more like half which is still ridiculously large, but it's at least somewhat reasonable.

Actually on 68000 the multiplication takes up to 72 cycles (for signed version) and up to 140 cycles for the signed division but given some benchmarks i did, you can assume a mean of 50 cycles for multiplication and about 90/100 cycles for division which is not that bad.With the genesis 68000 cpu (7.67 Mhz) i can transform (3D transformation + 2D projection) about 10000 vertices per second which is not that bad (i expected 6000 max).A single 3D transformation consist of :- 9 16x16=32 multiplication- 6 32bit additions- 3 16bit additionsA single 2D projection consist of :- 3 16bit additions- 1 32:16=16 division- 2 16x16=32 multiplication

The projection could be different but i handled it that way for convenience.Still that is a big amount of complexes operations and i don't count the load / store / shift operations here.If we just count maximum cycles of mul and div, we already obtain (11*70) + 140 which is close to 1000 cycles per vertex ! So we shouldn't be able to transform more than 8000 vertices per second just because of these operations... but hopefully that is not the case. I wonder how much we could transform with the SNES hardware using smart interlacing of operations with the different available multiplier / diviser units

This could be repurposed for '816. The routine has a large overhead of T1 loading because it's setup for sequential calls were T1 doesn't change, but you could easily switch that to indirect, removing the self-modifying code, and seriously cut that loading over head down by a lot. Also, the tables are split 8bit as well as the math, so that could be optimized into single tables with single sbc's; word wide tables and 16bit operations. A re-write/re-structure for '816 could probably get it close to 70 cycle range.

Interesting is there the same for signed multiplication (which is more useful) ? I guess it only requires different lut.

Edit: I found the signed version which add some cycles at the end of the multiplication operation, signed operands process a bit slower because of that, still that is a fast implentation for the 6502 CPU

Interesting is there the same for signed multiplication (which is more useful) ? I guess it only requires different lut.Edit: I found the signed version which add some cycles at the end of the multiplication operation, signed operands process a bit slower because of that, still that is a fast implentation for the 6502 CPU

Yeah, those two groups of 8bit SBCs could be optimized down to one each for '816. So worse case is two 16bit subtractions for signed overhead. Though I think a slightly more optimal routine could be written for signed input/output.

I use multiplication for calculating the rotation of the bosses' joints. The sines and cosines are 16-bit signed values from -256 to 256, and the radius is a signed 8-bit values. The result is a signed 16-bit value.

I thought of a fast routine that uses non-mode-7 multiplication registers.

What really boggles my mind about slowdown is when games lag with 4 sprites, while other games can have more than 40. You mean to tell me they programmed every routine 10 times slower than necessary?

What's even more odd is that the games with heavy slowdown seem to have a higher threshold on the second lag frame as if running the game itself takes a lot of overhead. Like, it takes 4 sprites to make it run at 30fps, but with 10 sprites onscreen, it STILL runs at 30fps? WTF?

For purpose of argument, I'll assume good faith, imputing no malice where incompetence is sufficient (Hanlon's razor), nor incompetence when bona fide intractability is sufficient.

As for the difference between the games: Some games have more complicated collision detection and path finding algorithms and may thus slow down with fewer active objects. I doubt that individual spread-gun bullet sprites in Contra or even ships in Recca have very complicated movement.

As for why 10 is no slower than 4: Perhaps there is a constant overhead equivalent to seven objects, such as decoding the map into background update packets. Or perhaps the player character is as complex as two or three objects. I know the walking characters in Haunted: Halloween '85 (the player and the zombies) are the most algorithmically complex because they have to read the collision map four times (bottom center, left, and right, and head-height at leading edge), compared to everything else that reads it once (bottom center) or not at all. Incidentally, I had to do a shload of optimization to the code that parses collision slabs to get it to work well with six walkers (the player and five zombies) on the most platform-filled levels (such as the barn) without slowing down.

What's funny is that I just got 1943 from a local video game store, Game X Change, (don't know why I felt like sharing that) and I played it and it's a ton better about not slowing down than Gradius III which I also got, except that this one is on the SNES... That's what really boggles my mind.

Isn't the SNES's CPU supposed to be something like 4x faster than the one in the NES?

Now that I think about it, I think most people mistakenly identify object collision as speed-critical code, when it could actually be the BG collision that bogs the cpu down. I wonder if slopes have anything to do with it, because a lot of NES games didn't use slopes.

Edit: I changed the wording, because I'm not exactly sure if this is totally accurate.

Last edited by psycopathicteen on Wed Jan 27, 2016 8:15 pm, edited 1 time in total.

Who is online

Users browsing this forum: No registered users and 2 guests

You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot post attachments in this forum