Is there a Z80 tutorial specifically targeted at people with a 6502 background, which explains how common idioms of game state organization on the 6502 translate efficiently to the 8080 family (8080/8085, Z80, LR35902), whose indexed modes are slow (Z80) or software-defined using add (others)? I have been told the 8080 family expects arrays to be aligned to the start of a 256-byte page, so that the program can load the field number in L and the object number in H or vice versa, but then any array with fewer than 256 elements wastes the remainder of the page. For the cases I'm considering (10-30 fields of 5-20 objects), aligning them without crossing pages is easy, but aligning them to the start of a page appears trickier. Would I have to consider the RAM as an "atlas" analogous to a texture atlas, with RAM being a two-dimensional array divided into rectangular sub-arrays, each with object index as one dimension, field index as the other dimension, and a stride of 256 bytes?

Some 8080-family coders suggest organizing data for a single-instruction, multiple-data (SIMD) approach that steps through two arrays at a time using DE and HL. For example, a program would update object positions in four passes: add horizontal acceleration to horizontal velocity of all objects, then add horizontal velocity to horizontal position of all objects, then add vertical acceleration to vertical velocity of all objects, then add vertical velocity to vertical position of all objects. Pixel shaders on modern graphics programs work similarly. But that doesn't work so well when a particular operation needs to be performed only on particular types of object or only in certain circumstances, which is often the case for enemy AI code that has to determine the accelerations for each object based on collision response, displacement to the player object, and other factors.

If I have asked before, and I am asking again, it's because when I did get an answer, it was one that I couldn't understand how to apply. Hence "wrapping my head around the Z80". Would basic questions like this be better received in a forum organized around a particular platform using an 8080 family CPU, such as Spectrum Computing (or, in the case of other platforms, SMS Power or gbdev.gg8.se)?

If people are used to NES hacking, I'd recommend doing some Sega Master System game disassembly and analysis to see how an equivalent tile & palette-based system works with a Z80 as the CPU. Learning Z80 isn't insurmountable, but yeah, tracing where DE and HL get loaded from is a tricky thing when you're used to 6502 zero-page pointers and the like.

LD A,(IX+5) isn't the best example. Those IX/IY instructions are looking nice at first glance, but they are so slow that one should usually avoid using them, or use them only in not so timing critical higher functions, or, in rare cases, where they are faster than opcode constructions. But in general, using LD A,(HL) is fastest. If you want to read from HL+1, HL+2, etc. just increment HL. Or increment L, that a bit faster, as long as no 100h-byte page crossing is needed.

When comparing C64 with Z80 systems like Spectrum or CPC, I would say that Z80 is a good bit faster. Parts because it's having more register & requires less memory accesses, and parts because it's having 16bit load/store/add/inc/dec instructions. The drawback is that Spectrum and CPC need to do sprite rendering by software.

Yeah, I've spotted that 2-dimensional array indexing trick via H and L in several CPC games. It's somewhat cool, but it's also a bit messy, and INC H isn't that much faster than using ADD HL for switching to the next array entry, so, in most cases I've "cleaned up" the code and removed that technique after disassembling the game.Btw. I was always believing that it's been a 6502 technique that had crept into Z80 world when people ported C64 games to CPC. I haven't checked if the original C64 code had used the same method, but it would make more sense there (since the 6502 can't easily do something like ADD HL for incrementing 16bit array indices).

There are a few more cases where one case "see" that many Z80 games came from C64, like cases where the C64 needed a handful of opcodes to do some operation, and the programmer blindly reproduced that code - although the Z80 could do the same thing using only a single opcode.

LD A,(IX+5) isn't the best example. Those IX/IY instructions are looking nice at first glance, but they are so slow that one should usually avoid using them, or use them only in not so timing critical higher functions, or, in rare cases, where they are faster than opcode constructions. But in general, using LD A,(HL) is fastest. If you want to read from HL+1, HL+2, etc. just increment HL. Or increment L, that a bit faster, as long as no 100h-byte page crossing is needed.

I mentioned specifically random access to an object's fields, not sequential access to its fields. I doubt that LD A,(IX+5) is slower than fiveINC L instructions, even if the structure is 32-byte aligned so that it does not cross pages.

; Pre- and post-condition: HL points to the most recently accessed; field of the 32-byte-aligned structure representing one enemy's; state. Which field happens to have been most recently accessed; is not known to this fragment.ld a,0E0h ; 7and a,l ; 4: HA points to the start of the structureor a,5 ; 7: HA points to the desired fieldld l,a ; 4: HL points to the desired fieldld a,(hl) ; 7: A contains contents of desired field; Total: 29 cycles

In order to use INC L to use sequential access to simulate random access, the programmer must keep track of which field was most recently accessed, even across subroutine call boundaries. What techniques are useful toward this? Is it fruitful to write a macro to generate the correct sequences of INC, DEC, or LD/ADD/LD sequences to point L at a particular field and save that field for later invocations of the same macro? Or should code first be written in a high-level language and then translated into Z80 only once the programmer is sure of in what order each object's fields shall be visited throughout the program, in order to be able to use sequential access (INC L or DEC L) as often as possible?

Okay, 47 cycles looks bad : ) the example with 28 cycles looks better. One cycle faster would be "LD C,L / SET 2,L / INC L / LD A,(HL) / LD L,C". And of course you could try to move the most often used entries closer to index +0 so you won't need the +5 stuff too often.

The gameboy doesn't have a real Z80 (it doesn't have any IX/IY registers), so one would actually need to use HL in the above fashion.

On CPC and Spectrum it might be best/easiest/fastest to use (IX+n) for the enemy logic in some places - but, on that computers, the main CPU load will go to software rendering, so the speed of the game logic doesn't matter too much, and it's more important to use fast opcodes in the rendering functions.

If you're accessing multiple fields of the same object in the same basic block, you can rely on the previous values of the low bits of L. This makes pointer arithmetic with XOR almost as fast as IX/IY indexing.

For random accesses within the space HL has already set up, what's wrong with directly specifying values to L (or H for that matter) if the next element is further away than the nearest neighbor so single INC/DEC isn't enough ?7 cycles for directly speficied element isn't bad if you cannot reach your goal in the next INC or DEC... still beats IX+x modes by several cycles (14 vs 19). You'll still have to put your stuff in 256 byte pages though, but that's not a limitation, and crossing the page would incur update to top part which can just be INC HL (6 cycles), and similar to page crossing penalty on 65x.

For random accesses within the space HL has already set up, what's wrong with directly specifying values to L (or H for that matter) if the next element is further away than the nearest neighbor so single INC/DEC isn't enough ?

If you have (say) 16 structures each 32 bytes in size, and structures are stored in consecutive addresses aligned to 32-byte boundaries, there are eight different L values that correspond to a particular H value. Or would you recommend interleaving several different unrelated arrays, as I mentioned earlier?

tepples wrote:

Would I have to consider the RAM as an "atlas" analogous to a texture atlas, with RAM being a two-dimensional array divided into rectangular sub-arrays, each with object index as one dimension, field index as the other dimension, and a stride of 256 bytes?

In the 2D approach, you might have actor properties in L=$00-$1F of each page and other unrelated things in L=$20-$FF of the same pages. (This wouldn't work for ColecoVision and SG-1000 because only four pages exist.)

Who is online

Users browsing this forum: No registered users and 1 guest

You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot post attachments in this forum