After turning around the problem in all directions, the only way I see that can lead to less flickering (due to the simulation of double buffering by drawing alternatively on odd and even lines) and better performance, is to draw all the pixels at the same time, without performing any overdraw (would flicker).

To do that, it means that I have to modify the polygon routine so the first phase just compute/collect all the data for each polygon, store that in some intermediate structure, and then draw it all in one go, from left to right, top to bottom.

The routine I'm using at the moment is very simple in it's principle. All I have is two tables, one called "min-x" and one called "max-x", which records the left and right extremities of each scanline in the triangle.

DIM MINX(200)
DIM MAXX(200)
FOR Y=0 TO 199 ' For each line
CURSET MINX(Y),Y,1
DRAW MAXX(Y)-MINX(Y),0,1
NEXT Y

Simple enough, isn't it ?

Now the problem is that when erasing and redrawing you just see an horrible clear/draw/clear/draw/clear/draw flickering effect, so a solution to that is to use the previous MIN/MAX values to erase only what needs to be erased, and draw only what needs to be drawn compared to what is already on the screen.

It works fine if you have only one polygon drawn. If you have many polygons, you still see them being drawn overlapped, so it's still ugly.

(and the fact that we are trying to achieve something visually nice, we have "dithering" on the polygons, used to simulate some shades of grey. And these values are different for each polygon because they depend of the orientation of the polygon toward the light source, etc... so just "completing the missing parts" is far from being trivial)

Assuming I can manage to find a way to store all the information necessary to display the next frame, it would be possible to have something like that:

DIM COUNT(200)
DIM MINX(200)
DIM WIDTH(200,999)
DIM STYLE(200,999)
FOR Y=0 TO 199 ' For each line
X=MINX(Y)
FOR I=0 TO COUNT(Y) ' For each segment
PATTERN STYLE(Y,I)
CURSET X,Y,1
W=WIDTH(Y,I)
DRAW W,0,1
X=X+W
NEXT I
NEXT Y

The change here, is that instead of calling the drawing code for each triangle, we have all the information for all the triangles stored in the tables. And then we can draw from left to right and top to bottom, without any overlapping.

The issue here, is that if done correctly, the new code would be faster if there are more polygons, and many of them overlap, because the additional complexity would lead to less overdrawing (ie: less screen update), but it is very hard to achieve, mostly because of how to organize the data structure, and the memory consumption.

The existing code that draw triangle per triangle just uses the two 200 bytes tables minx and maxx (plus some additional tables that I use in all my graphic routines, so they don't count here as additional size).

The new code needs to be able to compute the information of each scanline of the triangle and insert that in some larger structure that contains the description of a complete scanline... and then it means that technically we may have to keep one segment for each triangle if they are all visible (up to 240 vertical triangles), and for each need to store the width of the segment, and the rendering pattern.

The big question is: How to store that so it fits the following criteria:
- Fits in memory
- Efficient to process when inserting segments
- Efficient to process when displaying

The most efficient speed-wise is to store a fixed structure with a pre-allocated amount of potential segments to draw. It also happens to be the less efficient memory wise.

I will try to think about this, but indeed it is not a simple question. I guess any memory structure you use would be either memory hungry (tables) or quite difficult to manage (insert new segments in a linked list).

I did not see any problem with the current routine even with complex ships, but agree that if there are too many overlapping triangles it is not very efficient.

Sorry for the little off-topic now, but having a look at Stephen Judd's article in C= hacking about obj3d and lib3d I spotted a paragraph that might be relevant:

What we really want to do is to have the pattern stored to the screen,
without hosing nearby parts of the screen. A very crude and cumbersome
way of doing this would be to do something like

That might be OK for some lame-o PC coder, but we're a little smarter
than that I think. Surely this can be made more efficient? Of course
it can, and stop calling me Shirley. It just requires a little bit
of thinking, and the result is pretty nifty.
First we need to specify what the problem is. There are
three elements: the BITP endpoint bits, the FILLPAT fill pattern bits,
and the BPOINT bitmap screen bits. What we need is a function which
does

In the above, 'y' represents a bit in the fill pattern and 'x' represents
a bit on the screen. They might have values 0 or 1. If the bitmask
(i.e. 00000111) has a 0 in it, then the bitmap bit should pass through.
If the bitmask has a 1, then the *pattern* bit should pass through.
A function which does this is

(pattern EOR bitmap) AND (NOT bitmask) EOR pattern

This first tells us that the bitmask should be 11111000 instead of
00000111. This method only involves one bitmap access, and doesn't
involve any temporary variables. In short, it merges one bit pattern
into another, using a mask to tell which bits to change. So let's
look at that bitmap merge code again:

And there it is. Pattern EOR bitmap AND not bitmask EOR pattern.
Note that because the pattern is loaded first, .Y can be reused as
an index into the bitmap, and X can stay right where it is (to be
again used shortly). With these considerations, you can see that
trying to mask off the bitmap would be very cumbersome indeed!
Note that if the left and right endpoints share a column,
their two bitmasks need to be combined before performing the merge.
This just means EORing the masks together -- remember that they
are inverted, so AND won't work, and EOR is used instead of ORA
in case bits overlap (try a few examples to get a feel for this).

Not sure if this apply to your routine or if it can save some cycles (something worth it, at least). I think the rest of his poly routine is similar to yours. He is using self modifying code to make the fill routine start at a given LDA instruction and patching an RTS at the appropriate point. The only difference is that he includes the hidden-face check in the code.

I wonder if you could also include this check in yours. I know that keeping the dot product would let us include shading (as you stated), but now I am only computing the sign of the Z coordinate and it is taking quite a noticeable time, so imagine computing also X and Y... Maybe we can leave this as an option for later?