The RESPx registers for each of the sprites are strobe registers which effectively set the x position of each sprite to the point on the scanline the TIA is displaying when those registers are written to. Put more simply, as soon as you write to RESP0, sprite 0 begins drawing and it will keep drawing in that position on every scanline. Same for RESP1.

This session we're going to have a bit of a play with horizontal positioninig code, and perhaps come to understand why even the simplest things on the '2600 are still an enjoyable challenge even to experienced programmers.

As previously noted, it is not possible to just tell the '2600 the x position at which you want your sprites to display. The x positioning of the sprites is a consequence of an internal (non-accessible) timer which triggers sprite display at the same point every scanline. You can reset the timer by writing to RESP0 for sprite 0 or RESP1 for sprite 1. And based on where on the scanline you reset the timer, you effectively reposition the sprite to that position.

The challenge for us this session is to develop code which can position a sprite to any one of the 160 pixels on the scanline!

Given any pixel position from 0 to 159, how would we go about 'moving' the sprite to that horizontal position? Well, as we now know, we can't do that. What we can do is wait until the correct pixel position and then hit a RESPx register. Once we've done that, the sprite will start drawing immediately. So if we delay until, say, TIA pixel 80 - and then hit RESP0, then at that point the sprite 0 would begin display. Likewise, for any pixel position on the scanline, if we delay to that pixel and then hit RESP0, the sprite 0 will display at the pixel where we did that.

So how do we delay to a particular pixel? It's not as easy as it sounds! What we have to do, it turns out, is keep a track of the exact execution time (cycle count) of instructions being executed by the 6502 and hit that RESPx register only at the right time. But it gets ugly - because as we know, although there are 228 TIA colour clocks on each scanline (160 of those being visible pixels), these correspond to only 76 cycles (228/3) of 6502 processing time. Consequently only 160/3 = 53 and 1/3 cycles of 6502 time in the visible part of the scanline. Since each 6502 cycle corresponds to 3 TIA clocks, it would seem that the best precision with which we could hit RESPx is within 3 pixels. But it gets uglier still, and we'll soon see why.

The SLEEP macro has been useful to us now, to delay a set number of 6502 cycles. Consider the following code...

Surely that's a simple and neat way to position the sprite to TIA colour-clock 120? The 120 comes from calculating the 6502 cycle number (40) x 3 TIA colour clocks per 6502 cycle. The answer to the question is "yes and no". Sure, it's a neat way to hardwire a specific delay to a specific position. But say you wanted to be able to adjust the position to an arbitrary spot. We could no longer use this sort of code. Remember, SLEEP is just a macro. What it does is insert code to achieve the nubmer of cycles delay you request. The above might look something more like this...

We don't really know what the sleep macro inserts, and we don't really care. It's documented to cause a delay of n cycles, if you pass it n. That's all we can know about it. If we wanted to change n to n+1 we could do it at compile time, but we couldn't use this sort of code for realtime changes of the delay. What we want is a bit of code which will wait a variable bit of time.

And here's where the fun really starts! There are, of course, many many ways to do this. And part of the fun of horizontal positioning code is that it's just begging for nifty and elegant solutions to doing just that. What we're going to do now is just develop a fairly simple, possibly inefficient, but workable solution.

The essence of our solution will be to use a loop to count down the delay, and when the loop terminates immediately write the RESPx register. So the longer the delay, the more our loop iterates. In principle, it's a fine idea. In practice we soon see the severe limitations. We should be familiar with simple looping contstructs - we have already used looping to count the scanlines in our kernels, for example. Here's a simple delay loop which will iterate exactly the number of times specified in the X register...

That's as simple a loop as we can get. Each iteration through the loop the value in the X register is decremented by one, and the loop will continue until the Z flag is set (which happens when the value of the last operation performed by the processor returned a zero result - in this case, the last operation would be the 'dex' instruction). So as you can see, at just two instructions in size this is a pretty 'tight' loop. There's not much you can trim out of it and still have a loop! So what's the problem with using a loop like this in our horizontal positioning code? Let's have another look at this, but with cycle times added...

SimpleLoop dex ; 2
bne SimpleLoop ; 3 (2)

It has been fairly standard notation for a few years now to indicate cycle times in the fashion shown above. The number in the comment (after each semicolon) represents the number of 6502 cycles required to execute the instruction on that line. In this case, the 'dex' instruction takes 2 cycles. The 'bne' instruction takes 3 cycles (if the branch is taken) and 2 cycles if not. Unfortunately, life isn't always that simple. If the branch from the bne instruction to the actual branch location crossed over a page (a 256-byte boundary), then the processor takes another cycle! So we're faced with the situation where, as we add and remove code to other parts of our program, some of our loops take longer or shorter amounts of time to execute. No kidding! So when we come to doing tightly timed loops where timing is critical, we must also remember to somehow guarantee that this sort of shifting doesn't happen! That's not our problem today, though - let's assume that our branches are always within the same page.

So what's wrong with the above? Let's go back to our correspondence between 6502 cycles and TIA colour clocks. We know that each 6502 cycle is 3 TIA colour clocks. So a single iteration of the above loop would take 5 cycles of 6502 time - or a massive 15 TIA colour clocks. No matter what number of iterations of our loop we do, we can only hit the RESPx register with a finess of 15 TIA colour clocks! Is this a disaster? No, it's not. In fact, the TIA is specifically designed to cater for this situation. Before we delve into how, though, let's analyse this loop a bit more...

Since each iteration of the loop chews 15 TIA colour clocks, we must iterate (x/15) times, where X is the pixel number where we want our sprite to be positioned. Put another way, we need to know how many 15-pixel chunks to skip in our delay looping before we're at the correct position to hit RESPx and start sprite display. So when we come into this code with a desired horizontal position, we'll have to divide that value by 15 to give us a loop count. What's the divide instruction? There isn't one, of course!

So how do we divide by 15?

Another of those extremely enjoyable challenges of '2600 programming. Dividing by a power of 2 is easy. THe processor provides shifting instructions which shift all the bits in a byte to the left or to the right. Consider in decimal, if you shifted all digits of a number to the left by one place, and added a 0 at the end of the number, you'd have multiplied by 10. Similarly in binary, if you shift a number left once, and put a 0 on the end, you've muliplied by 2. Dividing by two is thus shifting to the right one digit position, and adding a 0 at the 'top' of the number. Typically, multiplication in particular and sometimes division are achieved by clever combination of shifting and adding numbers.

But we don't need to do that here. We know that there are ony 160 possible positions for the sprite. Why not have a 160 byte table, with each entry giving the loop counter for the delay loop for each position? Something like this...

DON'T do things by hand when the assembler can do it for you! What I've done here is write a little 'program' to control the assembler generation of a table of data. It has a repeat loop of 160 iterations, each iteration incrementing a counter by one and putting that counter value / 15 in the ROM (with the .byte pseudo-op). This code is equivalent to writing...

In any case, the idea of having a table is to give us a quick and easy way to divide by 15. To use it, we place our number in an index register, then load the divide by 15 result from the table, using the register to give us the offset into the table. Easier to show than explain..

It's good, and it's bad. Bad because it can't cope with 'loop 0 times' - in fact, it will loop 256 times. So let's add one to all the entries in the table, which will 'fix' this problem. Just change the '.byte .POS / 15' to '.byte (.POS / 15) + 1'. But I think we're digressing, and what I really wanted to introduce was the concept of looping to delay for a certain (variable) time, and then hitting RESPx at the end of the loop. You can see the problems introduced by this method, though, where we had to find a way to divide by 15, where we only had 15 colour clock resoluion in our positioning. There are other - and arguably better - ways to do horizontal positioning, but let's not make the better the enemy of the good. What we're really after right now is a working solution.

So in theory, our positioning code so far consists of dividing the x position by 15, looping (skipping 15 colour clocks each loop) and then hitting the RESP0 register to start drawing the sprite. Is this all there is to it? Yes, in a nutshell. But the devil is in the detail. Let's integrate what we have so far into a kernel which constantly increments the desired X position for the sprite, then attempts to set the x position for the sprite each frame (see the source code and sample binary).

Now this is very interesting. Clearly our sprite is moving across the screen as our desired position is incrmenting. But it's moving in very big chunks. We have a bit of optimising to do before we have a sprite positioning system capable of pixel-precise horizontal positioning. But it's a start, and we understand it (I hope!).

There are some observations to make about this code and binary. I've introduced a little more 6502, which we can examine now...

inc SpriteXPosition; increment the desired position by 1 pixel
ldx SpriteXPosition
cpx #160 ; has it reached 160?
bcc LT160 ; this is equivalent to branch if less than
ldx #0 ; otherwise reload with 0
stx SpriteXPosition
LT160
jsr PositionSprite; call the subroutine to position the sprite

This is the bit of code which does the adjustment of the desired position, loads it to the x register and calls a 'subroutine' to do the actual positioning code. This is our first introduction to the 'bcc' instruction, and to the 'jsr' and 'rts' (in the subroutine itself) instructions. We have previously encountered the Z flag and the use of flags in the processor's status register to determine if branches are taken or not. The delay loop uses exactly this. The Z flag isn't the only flag set or cleared when operations are performed by the processor. Sometimes the 'carry flag' is also set or cleared. Specifically, when arithmetic operations such as additon and subtraction, and also when comparisons are done (which are essentially achieved by doing an actual addition or subtraction but not storing the result to the register). In this case, we've compared the x register with the value 160 (cpx #160). This will clear the carry flag if the x register is LESS than 160, or set the carry flag if the X register is GREATER than or EQUAL to 160. I've always used the carry flag like this for unsigned comparisons. In the code above, we're saying 'if the x register is >= 160, then reset it to 0'. All branch instructions cost 3 cycles if taken, two if not taken, and an additional cycle if the branch taken crosses a page boundary. Branches can only be made to code within -128 or +127 bytes from the branch. For longer 'jumps' one can use the 'jmp' instruction, which is unconditional.

For long conditional branches, use this sort of code...

cpx #160
bcs GT160 ; NOT less than 160 (bcs is a GREATER or EQUAL comparison)
jmp TooFarForLT ; IS less than 160
GT160
; lots of code
TooFarForLT; etc

But I digress! The 'jsr' instruction mnemonic stands for "Jump Subroutine". A subroutine is a small section of code somewhere in your program which can be 'called' to do a task, and then have program execution continue from where the call was made. Subroutines are useful to encapsulate often-used code so that it doesn't need to be repeated multiple times in your ROM. When the 6502 'calls' a subroutine, it keeps a track of where it is calling FROM, so that when the subroutine returns, it knows where to continue code execution. This 'return address' is placed on the 6502's 'stack', which we will learn about very soon now. The stack is really just a bit of our precious RAM where the 6502 stores these addresses, and sometimes other values. The 6502 uses as much of our RAM for its stack as it needs, and each subroutine call we make requires 2 bytes (the return address) which are freed (no longer used) when the subroutine returns. If we 'nest' our subroutines, by calling one subroutine from within another, then each nested level requires an additonal 2 bytes of stack space, and our stack 'grows' and starts taking increasing amounts of our RAM! So subroutines, though convenient, can also be costly. They also take a fair number of cycles for the 6502 to do all that stack manipulation - in fact it takes 6 cycles for the subroutine call (the 'jsr') and another 6 for the subroutine return (the 'rts'). So it's not often inside a kernel that we will see subroutine usage!

As noted, the 6502 maintains its stack in our RAM area. It has a register called the 'stack pointer' which gives it the address of the next available byte in RAM for it to use. As the 6502 fills up the stack, it decrements this pointer (thus, the stack 'grows' downwards in RAM). As the 6502 releases values from the stack, it increments this pointer. Generally we don't play with the stack pointer, but in case you're wondering, it can be set to any value only by transfering that value from the X register via the 'txs' instruction. If you've been following closely, you have noticed I added a bit to the initialisation section!

ldx #$FF
txs ; initialise stack pointer

Without that initialisation, the stack pointer could point to anywhere in RAM (or even to TIA registers) and when we called a subroutine, the 6502 would attempt to store its return address to wherever the stack pointer was pointing. Probably with disasterous consequences!

Positioning sprites is a complex task. This session we've started to explore the problem, and have some working code which does manage to roughly position the sprite at any given horizontal position we ask. Next session we're going to dig into much more robust horizontal positioning code, and learn how the TIA provides us that fine control we need to get the horizontal positioning code precise enough to allow TIA-pixel-precise positioning. Once we've achieved that, we can pretty much forget about how this works forever more, and use the horizontal positioning code as a black box. Or perhaps a woodgrain box might be more appropriate

See you next time!

Attached Files

8.0 Horizontal MotionHorizontal motion allows the programmer to move any of the 5 graphicsobjects relative to their current horizontal position. Each object has a 4 bithorizontal motion register (HMP0, HMP1, HMM0, HMM1, HMBL) that canbe loaded with a value in the range of +7 to -8 (negative values areexpressed in two’s complement from). This motion is not executed until theHMOVE register is written to, at which time all motion registers move theirrespective objects. Objects can be moved repeatedly by simply executingHMOVE. Any object that is not to move must have a 0 in its motionregister. With the horizontal positioning command confined to positioningobjects at 15 color clock intervals, the motion registers fills in the gaps bymoving objects +7 to -8 color clocks. Objects can not be placed at any colorclock position across the screen. All 5 motion registers can be set to zerosimultaneously by writing to the horizontal motion clear register (HMCLR).

8.0 Horizontal MotionHorizontal motion allows the programmer to move any of the 5 graphicsobjects relative to their current horizontal position.

(Andrew, I know we're getting ahead of you here, but...well...H_D started it! )Here's the one problem with those (and this comes directly from my experience with joustpong)...unless I'm very sadly mistaken (wouldn't be the first time) you can't READ the horizontal position from anywhere...so that means you have 3 general options:

1. "Don't care" where an object is. This is how my first (and current) Alpha of JoustPong works...it resets the ball to center, and from there my program only knows what direction the ball is headed (since I set it, and only change the direction when it hits something) and when it hits something. That's why I use "playfield bars" as goals, really...if the ball hits the playfield, I check which direction it's going, and then I know who got a point (and if it hits a player, I know which player, so I set the motion accordingly. But at this point, I don't even have a way of telling if the ball is off the side of the screen!

2. Shadow the current poisition with a variable. I don't know if this technique actually works well or not, but theoretically you could keep a variable, set it whenever you set the horizontal position of an object, and make sure it's updated every time objects move by whatever the object's speed is.

3. Precisely reposition the object from the value of a variable every frame. I CAN'T WAIT FOR ANDREW TO CONTINUE THIS LESSON! Because this would come in really useful for my current game.

I'd dying to hear about the technique...I don't plan to reuse sprites, but it's getting to the point where I realy want to know exactly where that damn ball is.

Here's the sprite positioning code from Qb. The stella archive is always a great place to search for this sort of stuff. Next session we'll analyse this code (and possibly other variants) carefully. For now, see if you can figure how it works.

Hint: We covered how we needed to divide by 15. This code does a divide by 16 which is ALMOST a divide by 15. The trick is to analyse the differences between the two divisions and do the divide by 16, and adjust the result based on those differences. Neato lateral-thinking code, which is what the '2600 is all about.

I'd dying to hear about the technique...I don't plan to reuse sprites, but it's getting to the point where I realy want to know exactly where that damn ball is.

Here's the sprite positioning code from Qb. The stella archive is always a great place to search for this sort of stuff. Next session we'll analyse this code (and possibly other variants) carefully. For now, see if you can figure how it works.

Cool. My next two tasks in JoustPong are two A. get a tighter kernal and B. use that kernal to proof of concept a "Poorlords" Playfield, but to get any further with that I'll need C. to use something like this.

Re: "great place to search for this sort of stuff"...how do you search it? As far as I can tell the search routine there is very broken. I use Google, which is annoying because it doesn't sort the results chronologically. (And also, sometimes it's very hard to figure out what all the variables mean, which is why I've had such a hard time figuring out skipdraw)

Here's the sprite positioning code from Qb. The stella archive is always a great place to search for this sort of stuff. Next session we'll analyse this code (and possibly other variants) carefully. For now, see if you can figure how it works.

PositionSprites
; Set the horizontal position of the two sprites, based upon their coordinates
; This uses the tricky 2600 method of positioning (RESPx, HMPx, etc)
; Algorithm invented a looong time ago by persons unknown
; Re-invented Feb2001 by yours truly, then optimised according to code by Thomas Jenztsch
sta WSYNC
sta HMCLR ; clear any previous movement
ldx #2 ; sprite index

I'm starting to try to integrate this and get it working before understanding it fully (my first attempts aren't so hot...it makes the screen go crazy) but it's weird...this is meant to be a subroutine, right?But PlayerX holds P0 position, and PlayerX+1 holds P1? Are the other variables 1 byte each? And if this IS a reusable subroutine, isn't it odd that it sets sprite index (which I think selects P0 vs P1) inside the subroutine? Would that generally be set outside, by the calling portion of the program?

Here's the sprite positioning code from Qb. The stella archive is always a great place to search for this sort of stuff. Next session we'll analyse this code (and possibly other variants) carefully. For now, see if you can figure how it works.

PositionSprites
; Set the horizontal position of the two sprites, based upon their coordinates
; This uses the tricky 2600 method of positioning (RESPx, HMPx, etc)
; Algorithm invented a looong time ago by persons unknown
; Re-invented Feb2001 by yours truly, then optimised according to code by Thomas Jenztsch
sta WSYNC
sta HMCLR ; clear any previous movement
ldx #2 ; sprite index

I'm starting to try to integrate this and get it working before understanding it fully (my first attempts aren't so hot...it makes the screen go crazy) but it's weird...this is meant to be a subroutine, right?But PlayerX holds P0 position, and PlayerX+1 holds P1? Are the other variables 1 byte each? And if this IS a reusable subroutine, isn't it odd that it sets sprite index (which I think selects P0 vs P1) <i>inside</i> the subroutine? Would that generally be set outside, by the calling portion of the program?

This subroutine sets the positions of BOTH sprites in one go. The PlayerX variable is two bytes, holding one byte for each player. Feel free to change it - you only need the positioning code inside the loop!

This subroutine sets the positions of BOTH sprites in one go. The PlayerX variable is two bytes, holding one byte for each player. Feel free to change it - you only need the positioning code inside the loop!

Oh boy...hope this doesn't make it harder to adapt to the Ball, which is what I need it for now.

I'm still alarmed at what weird results I got when I naively tried to slap this into my code...hopefully tonight I can take a look at it and post a heavily commented version up to here...

I traced all the arithmetic, which in effect is a lot division and "mod"ing by 16, but I'm not quite clever enough to figure out equals division and mod by 15. Still, Y is position/15, A is like Pos%15, you take the negative of A, move it into bytes D7-D4 of the right move register, then do the little "Jiggle" loop for the value of A. rouhgly position the thing, then use HMOVE to do the fine positioning one time.

However, what confused me is the "jsr Ret"... should this code have

Ret
RTS

or something? I tried this after the code block above:

JMP SkipRet
Ret;all done
RTS
SkipRet

but the ROM totally freaks out....so instead I replaced it with 4 bit 0s, to get 12 cycles.

but the ROM totally freaks out....so instead I replaced it with 4 bit 0s, to get 12 cycles.

What should the "Ret" routine look like?

You had it correct. The subroutine just immediately returns, taking 6 cycles for the call, 6 cycles for the return. Giving you 12 cycle delay at a cost of three bytes. Replacing with three 4-cycle or four 3-cycle or 6 two cycle instructions is just fine. You could replace the delay section (all 15 cycles) by a SLEEP 15 if you wanted.

You had it correct. The subroutine just immediately returns, taking 6 cycles for the call, 6 cycles for the return. Giving you 12 cycle delay at a cost of three bytes. Replacing with three 4-cycle or four 3-cycle or 6 two cycle instructions is just fine. You could replace the delay section (all 15 cycles) by a SLEEP 15 if you wanted.

I.E. is overscan, vert sync, and vertical blanking all valid times do to game logic?

Yup, you can use them as you want to (except fopr horizontal fine positioning (HMOVE), which doesn't work during VSYNC)

Usually I start with using overscan for collision detection, scoring and controller, VSYNC only for updating the framecounter and VBlank for preparing the kernel (setting up graphic pointers, horizontal positioning etc.). Later I might have to move some code around, but it is a good start.

Here's the sprite positioning code from Qb. The stella archive is always a great place to search for this sort of stuff. Next session we'll analyse this code (and possibly other variants) carefully. For now, see if you can figure how it works.

PositionSprites
; Set the horizontal position of the two sprites, based upon their coordinates
; This uses the tricky 2600 method of positioning (RESPx, HMPx, etc)
; Algorithm invented a looong time ago by persons unknown
; Re-invented Feb2001 by yours truly, then optimised according to code by Thomas Jenztsch
[Lots of code here - bob]

Hint: We covered how we needed to divide by 15. This code does a divide by 16 which is ALMOST a divide by 15. The trick is to analyse the differences between the two divisions and do the divide by 16, and adjust the result based on those differences. Neato lateral-thinking code, which is what the '2600 is all about.

Umm...I think I'm slow or something, but I'm having a hard time figuring this code out, and I couldn't find any place where anyone "analyse[d it] carefully" - is that still in the works? Or if someone went through this, would someone be so kind as to tell me where? (Two years late, I know...)

I still don't get how RESP works. From what I read, the object is moved to the cycle where data is written to RESPx, but when I set the ball to go to cycle 28 (marked by a PF color change), it appears about four clocks later, as shown in the code and screenshot below.

I'm sure I could do empirical tests to find out where objects land on each cycle, but I'd rather understand what's going on. Can anyone explain how to determine where an object will be moved using RESP?

Attached Images

I still don't get how RESP works. From what I read, the object is moved to the cycle where data is written to RESPx, but when I set the ball to go to cycle 28, it appears about four clocks later, as shown in the code and screenshot below.

You might be seeing a delay in the playfield color register. If you change your demo a bit and use a player instead of a color register you'll see the ball is shifted left by a pixel from the player's position. I might have to read up on Andrew Towers' TIA Hardware Notes again to see if he has an explanation for this.

It has nothing to do with the playfield colour. It's just the normal offset for the positions of the objects. A RESP will set the objects position to the pixel at which the write happens. But the TIA needs a couple of cycles from the pixel clock to set up the actual output. Therefore all objects will be delayed by a couple of pixels.

The offset is 4 pixels for missiles and the ball, 5 pixels for single-width players, and 6 pixels for double- and quadrouple-width players. You need to take these offsets into account when calculating the position of an object.

It seems to me that this notion of 4 to 6 color clocks of offset latency is pretty fundamental to understand how to achieve proper horizontal positionning.
I wonder why it was not mentioned in Andrew's tutorial? Or maybe I didn't noticed?