I'm fighting for too long with my dma stuff so I'm looking for some help

Here is the story

On my game, every sprite on screen is 4x4
Each sprite has its own dedicated tiles on VRAM
Every vint, I dma the tiles of each sprite if they change according current animation (which means I could transfer sprite_count * 256bytes per vint)

according DMA doc, the minimum available on vint are 205bytes * 86 scanlines
So I have all the bandwidth I want
unfortunatly, it starts to lag a lot at 7 sprites (yes, only SEVEN !)

What seems to occur :

main loop
vint handler, not finish in time so finish while main loop (re)start

Does it mean I could only transfert 205 bytes per DMA call ?
How could I know if I'm out of scanline (in this case, I'll skip current dma queue and keep it for next vint) ?
Does i mean although you're able to get 80 sprites on screen, you can't get 80 DIFFERENT sprites ?
Do you know how to master all of this ?

or perhaps, I'm totally on the wrong way with my 1 sprite = 16 tiles ?

thanks for any help, I would like to avoid to rewrite all for nothing...

Since I have several DMA per vint, I start the DMA on vint ..how else ?

Of course, there is some code before each dma , to find the address source based on sprite's properties.
I'm trying to optimize it the best I could but I can't tell you its real weight...I don't know how to get it, apart disasm the produced bin

It's actually 204 bytes per scaline, but that shouldn't be big enough of a difference to cause what you're seeing. You're probably burning too much CPU time. Every ~2.3 68K cycles, you lose one byte of DMA bandwidth.

KanedaFr wrote:Does it mean I could only transfert 205 bytes per DMA call ?

Nope. You can't cross a 128KB boundary, but you generally don't need to worry about a maximum size for DMA transfers.

KanedaFr wrote:How could I know if I'm out of scanline (in this case, I'll skip current dma queue and keep it for next vint) ?

You could check the line portion of the H/V counter register.

KanedaFr wrote:Does i mean although you're able to get 80 sprites on screen, you can't get 80 DIFFERENT sprites ?

You can have 80 different sprites, you just can't transfer new frames for all 80 of them every frame. In a typical platformer, you might DMA new frames for your player character and maybe certain special enemies/objects, but a lot of of enemies will have only a few frames of animation and will keep those frames resident in VRAM.

If you're doing something like reading the pads before starting the DMA, the cycles used could eat into what's available for transferring data. As MoD said, roughly 2.3 68k cycles per byte. This includes code setting up and starting each DMA. You gotta count it all. If the Z80 is running and accesses 68k space, that will also steal time.

Also you could extend the VBlank, for example, using the h-int on line 192 (or maybe less), disable the display and make the DMA in the hint, and then re-enable the display before leaving the hint, rather making the DMA during Vint.

Mask of Destiny: the number of bytes per scanline is 204 (instead of 205 as shown in Sega.doc) for the H40 mode, and how is the number for the H32 mode ("sega.doc" says 167)?

Extending the VBlank seems pretty extreme for 3.5KB of data per frame.

@Kaneda: The HVCounter is a reasonable source of timing info. In H40 mode, one hcounter increment corresponds to roughly 16 master clock cycles (exactly 16 cycles outside of HSync). Only problem is the nasty jump part way through the line.

If you can't easily figure out how to speed up the code that's doing all the DMA setup, what you might want to try moving as much as possible outside of your VBlank routine. For instance, you could store the control port writes in a buffer for setting up the DMA transfers in a buffer during the active display and then just copy that data to 0xC00004 in your VInt handler.

gasega68k wrote:Mask of Destiny: the number of bytes per scanline is 204 (instead of 205 as shown in Sega.doc) for the H40 mode, and how is the number for the H32 mode ("sega.doc" says 167)?

The H32 number is off by one as well (should be 166). The problem is that there is an extra refresh cycle when the display is off (or you're in VBlank) compared to when it's on. In H40 mode there are normally 5 refresh slots per line, but there are 6 whenever it's not actively rendering which leaves 204 slots. In H32 there are normally 4 refresh slots per line and 5 during inactive lines.

I have no idea why that is or why Sega's documentation didn't take that into account, but it's what I've observed in my logic analyzer captures. Maybe the active display refresh timing is pushing things a bit, but they're able to get away with it do to all the normal access?

Mask of Destiny wrote:
@Kaneda: The HVCounter is a reasonable source of timing info. In H40 mode, one hcounter increment corresponds to roughly 16 master clock cycles (exactly 16 cycles outside of HSync). Only problem is the nasty jump part way through the line.

Good to know !
What do you call the "jump part way ..." ?

If you can't easily figure out how to speed up the code that's doing all the DMA setup, what you might want to try moving as much as possible outside of your VBlank routine. For instance, you could store the control port writes in a buffer for setting up the DMA transfers in a buffer during the active display and then just copy that data to 0xC00004 in your VInt handler.

To do it, it means you need one and unique DMA per vint...
In my case, it's up to one per sprite so undoable.....unless I store the DMA info as a sprite attribute and write it on register using long ...
Could be done, just need to remember how to write inline asm with param on C
It will ask more memory (4 longs per sprite) but probably better than lag

Got my numbers:
including my debug stuff to printout the HVCounter, I have about 0x400 diff between 2 sprites's frame DMA
so 0x400 x 16 cycles / 2.3 => 7Kbytes lost per frame DMA
Since the available is 204*86 => 17Kb, it means I could only handle 2 update...
I think my way to compute the numbers or my HVCounter is wrong

In fact, my lag occurs on the most awful case possible : ALL the sprites update their frame at the SAME time (ie. on the same vint)

Since, in fact, each anim rate is different (walk is updated every 4, jump every 6, ....), it would RARELY happen....
I tried ingame and I was able to push up to 10 sprites without lag notice.
Knowing it will be the max needed in 2P mode, it's a good news.

At the same time, I defined some internal functions as inline and converted a function to a define.
I made it as a test but i don't sure I gain a lot since -O1 already make some of these convert for you at compilation time.

Of course, while it's enought for my current project, I perhaps need to optimize it for future projects;
Or perhaps the only answer is that I'll never use more than 10 animated sprites on screen at the same time.
For ex, shmups don't need 30 full animated sprites so perhaps I shouldn't lose too much time on this, unless someone got an idea to test

It may not be the DMA that's slowing everything down, but rather how you calculate the next sprite an object needs over all the objects. Try something simple like making every object simply use the next frame and see if it gets radically faster. If so, you need to optimize the code for handling the objects, not the dma.

I made some tests, removing almost everything but the dma
and reforce the DMA to occur every vint for every sprite

Diff hcounter between 2 DMA = 27 (and not 0x400, I was totaly wrong)
lags at 8 sprites on screen

If I ever DMA 128 words per sprite and not 256, no lags
ugly sprite but not lags

So it means it would be hard to DMA Copy more than 8*256 words during vint
it means (:!: not tested )
<8 sprites of 32x32
<16 sprites of 16x16
<32 sprites of 8x8on the same vint
If the sprites DMA are not in sync, you can go higher

Create VRAM transfer queue that gets processed every Vint.
The handler of the queue needs to have as little code and as fast as possible so make good use of (Ax)+ addressing mode.
You want to minimize bandwidth loss to code execution.