Don't hesitate to participate by giving suggestion, and even better provide a faster version.

Ideally we are looking for two kind of "faster" versions:
- The faster one that does not use too much memory (good for games)
- The insanely faster one that can gobble half of the memory (usable for demo records and stuff like that)

The last published was down to 822, Twilighte took a look and managed to reduce it to nearly 800, and since I had a day off, I spent some time on the code today, both optimizing even more the code down to 793 (yeah, we are under 800 !), plus additional code path for totaly vertical and totaly horizontal lines.

I can't remember where I got this code but it was supposed to be very very fast. I wonder if it could be adapted in oric asm, with all the screen optimizations you can usually do (you know the tables), pecause the curset function is soooo slow.

After the discussions about 1337, decided to take a look at 8 bit Bresenham implementations. The version in C= Hacking is very buggy - actually it does not even assemble I believe -

So the last published was down to 796 units, the new code is now down to 676. Not impressive I know, but possibly the setup code could be improved. I'm not quite sure how I could improve the inner loops.

Dbug wrote:
So Chema, could you try to integrate that one in 1337 and tell us if it improves the framerate in any perceivable way?

Ok I integrated into the shipdemo test program and it improves the time in 1 "unit" of 1/100ths of second. This is quite a constant value, but most noticeable in some models, where there are more and longer lines to draw.

Would need a more accurate timing to be sure, but this could be a good first approximation.

I think, however, there are some "artifacts", some extra pixels at the end of some lines and even a single pixel in the middle of a line which is drawn 1 pixel up. This only happens sometimes.

Would a method for "crunching" all the pixels in a single scan and plot them with only one sta improve this even further?

Algarbi wrote:wow from 856 to 644 is a huge leap, well done Dbug, I am sure you can still improve this

Continue the good work

He still has to fix the artifacts so that might go up a few cycles but it's definitely a nice gain. I think more savings is certainly possible but the more you save, the more effort required to get those savings.

I think the only significant improvement will be in devising a way to calculate/write all pixels for a byte in one shot, possibly reducing the number of OR operations with screen memory.

I played with the idea of calculating how many horizontal bits you write for each x. Then you just perform a table lookup of the bits and OR the byte with the screen destination. However, tables of possible bit patterns is a problem since the size of the table increases by 1 each time you add a horizontal pixel. (see bit pattern tables below). There may also be a different number of pixels per byte depending on the current x value, so I didn't really figure out a practical way to do this except for horizontal lines where I don't need to perform as many lookups.

To do this for horizontal lines you still have to draw the ends of the line separately but everything in between is just a loop writing bytes with all pixels set. And the ends can potentially be done with a lookup but it requires a table for each end. This offers a bigger gain on the CoCo/MC-10 where there are 8 pixels / byte in hi-res, the CPUs have 16 bit index registers, and the screen layout makes for easy math. I'm not sure about the Oric with the 6502, 6 pixels / byte, and it's screen layout.

Basic algorithm on CoCo is something like this (not exactly 100% C but you'll get the idea):

#define BYTESPERLINE 32
#define PIXELSPERBYTE 8
If X2 < X1 then swap points
if X2 -X1 < PIXELSPERBYTE + 1 then special case because less than 2 bytes involved
; note that shifts work well for the math on this screen layout
; if you have to use preincrement then subtract one at end of following line
; calculate starting address of first byte
Address = ScreenBase + (Y * BYTESPERLINE) + (X1 / PIXELSPERBYTE )
; lookup pixels to set and write leftmost byte
*(Address++) |= (*(LeftTable + (X1 & 7))) ; set bits in left byte
I =(X2-X1-2)
;loop for middle bytes/pixels if needed (test takes place before a write)
DO
*(Address++) = 255 ; 255 = set all bits on middle bytes
WHILE (I-- > 0)
; lookup pixels to set and write rightmost byte
*(Address++) |= (*(RightTable + (X2 & 7))) ; set bits in right byte

Oric version would have smaller tables due to less pixels / byte. Attribute bits would need to be set to what you want to use.
The math however might be ugly unless you can just lookup values... which I think I used anyway as it was faster.
; Left end of horizontal line
00111111
00011111
00001111
00000111
00000011
00000001

; Right end of horizontal line
00100000
00110000
00111000
00111100
00111110
00111111

I hope I'm making sense here but it's late, I'm tired, and that might be the formula for chocolate milk for all I know.

Just an FYI, the C64 version of Elite doesn't draw circles with a normal circle routine. It looks like it's drawing 4 lines / quarter since lines are a probably a little faster.

<edit>
Notice the last three bits of the X represent the pixel number within a byte and the top bits represent the byte number on a given screen line on the CoCo.

The tables can overlap by a byte is you put the right table first and the left table second. It saves a byte.

initialize x, dx, etc.
xold = x
take a step in x: LSR X
have we hit the end of a column? If so, then plot and check on y
is it time to take a step in y?
if not, take another step in x
if it is, then let a=x EOR xold
plot a into the buffer
let xold=x
keep on going until we're finished

It saves a lot of time for lines where dX > 2 * dY on the C64 the Atari 2600 (~33% saving there). Maybe it can somehow be adapted for the Oric?