First, I want to thank everyone who e-mailed me in the last week - I got about
20 messages of people, most of which fully agreed with the first article. I also
got a lot of information on some of the subjects I proposed, especially realtime
raytracing: There are some very cool people working on that; from one of them
(Thomas Ludwig) I even received a ready-to-be-published article describing lots
of ways to speed up a realtime raytracer.

Nevertheless, I won't write about realtime
raytracing in my first chapter, since I want to experiment with raytracers
before writing about them - But I promise there will be an article on them. My
appetite is definitely wettened. :)

Programming for PDAs

The first topic that I want to write about is 'PDA programming'. More
specifically, coding for WindowsCE. Let me explain
why I like PDA's: First, they are first-class geek devices. :) Second, they
resemble old PC's: No floating point instruction set, no hardware accelerated
3D, and most importantly: There's almost no software for them. And that's odd,
since they sell very well: Compaq recently estimated that the demand for their
popular iPaq
outnumbered stocks 25 times.

I myself contributed a bit to the CE coding scene too: I wrote Lemmings for CE, and last
night, I finished a little 3D
engine. I'll write a bit about both.

By the way, I used Microsoft Embedded Visual Tools (the capitalization isn't
correct, but as long as Microsoft is not spelled as MicroWarez or something like
that, I think they show some weird attitude by calling the thing eMbedd3d v1sUa1
t00ls or something like that). This is an integrated IDE and compiler for all
Windows CE devices; it's a free download (available here, ~300Mb).

Lemmings

When writing software for a palm computer, you have to consider more than just
the hardware limitations like speed and the lack of an FPU. Data input for
example is a huge problem: Basically, you have a touch screen, and that's it.
Another problem is the small screen: The iPaq has a 240x320 display, and that's
not just small, those are odd dimensions as well. So, a horizontal shoot 'm up
is not a good idea; it looks bad on the narrow display, and continuesly pounding
the touch screen is not a good idea. Lemmings, on the other hand, requires only
mouse input, and the screen size can be lived with if the levels are designed
properly. Other game types that work well on PDA's are RPG's, like Zelda,
puzzles, board games and racing games.

I wrote the first version of Lemmings for the Philips Nino. That's an older
Windows CE device, with the same display dimensions, but with less memory (8Mb),
a slower processor (~80Mhz, I believe) and a 2bit display.

The processor speed is no problem. Lemmings is not a very processor-intensive
game, and I had no problem at all getting it to run fast enough.

Memory, on the other hand, is another matter. If you have only 8Mb, a level that
takes up 250Kb is a problem, especially if you want four of those. That pretty
much rules out side way scrolling. I tried side way scrolling, by the way, and
it just didn't work: The tapping on extra on-screen buttons just take the fun
out of the game; a level that is small enough to fit on one screen is simply
more fun to play.

To take maximum advantage of the available memory, I considered using a trick
that I once used for a side way scrolling shoot 'm up: A c-buffer. Now
what has a c-buffer to do with a shoot 'm up, I hear you ask? :) Well, here's
what: Suppose, you have eight layers. Each layer, except for the last one, has
large black areas, where you can see the next layer. The filled pixels between
these black areas can thus be stored as spans of pixels, plus their position and
length. The average Lemmings level contains a lot of black pixels, so this way
of storing the level actually compresses it to less than a quarter of the
original size. Nice. :) I didn't intend to use multiple layers for the Lemmings
game, so displaying a level is straightforward. For the parallax scroller, I use
a c-buffer to determine what parts of each scanline is already filled with
pixels, so the next layer is effectively 'clipped' against the drawn layers,
resulting in zero overdraw. The number of layers that can be displayed this way
of course depends on the complexity of each layer, but in practice, ten or more
layers are easily fast enough.

The next problem is the 2bit display. Four colors is just not enough for some
decent graphics, and so I hacked around a bit. A typical LCD display is very
slow; a blinking rectangle for example is only perceived as 'blinking' if the
rate at wich it is turned on and off is less than ten times a second. The Nino
actually allowed much faster writes to the display buffer than that, so I
figured that if I would blink twenty times a second, I would effectively get a
greyscale. :) This works, indeed, so now I can display sixteen greyscales; the
flicker is noticeable, but not annoying, and the overall effect is much better
than 4 colors.

Then came the iPaq. That machine is four times faster than a Nino, has four
times the memory, and a sixteen bit display. To make things worse, the iPaq has
a newer version of WindowsCE (3.0 instead of 2.01), and the display buffer is
rotated by 270 degrees. Porting Lemmings to this device is easy, but in an ideal
world, one would like Lemmings to run on both devices, without modifications to
the game source. So, I wrote a little lib (called EasyCE) that offers the game
a 16bit framebuffer, and displays it correctly on both the Nino (using greyscale
emulation) and the iPaq (using a rotated blitter) and all other WinCE devices.
The Lemmings source code contains no platform specific code, and no #ifdef's.
It's even easy to port to Win32, using OpenPTC, for example.

I consider Lemmings to be 'done', and I added it to my portfolio. :)
I've added the sources for Lemmings, so if you feel
like doing a windows port, go ahead. Be my guest. Should be a doddle.
(Editor's Note: The images and sounds that work with the demo are not included in the above zip file.
If you would like those, contact The Phantom here)

A 3D Engine For The iPaq

As I already mentioned, the iPaq does not have a floating point coprocessor. And
no 3D hardware, of course. So there was a clear challenge for the prophet of
software rendering, a.k.a. "The
Phantom". :)

About two years ago I was still working for Lost Boys Interactive,
a multimedia company located in Amsterdam.
At that time, I was working on a software rendering engine that should have come
close to the performance of a VooDoo (how foolish). Just before the cancelling
of the project, it could display a 50K poly 3DS model of a bull at 15 frames per
second, on a P200.With phong shading enabled, it
still ran at 12fps. I want that on my iPaq. :) This became even more important
when someone released a 3DS file of the iPaq in all it's glory; 50K polygons
describing every subtle bend and hole that I touched so often on my beloved
companion. Wouldn't it be nice to be able to draw this model in realtime on my
iPaq? And, the model just asks for it, on the display of the rendered iPaq,
there should be another iPaq, and so on...

So, I need some good 3D code. However, nobody writes fixed point code any more.

[A bit about fixed point, for those of you who are not familiar with it: Fixed
point math basically means that you use whole numbers only. To get the desired
accuracy, you use large whole numbers. For example, instead of 1.734 you could
also write 1734, without losing any information. Of course, you need to
remember that you multiplied by 1,000, so if you multiply two of those numbers,
you need to divide the result by 1000 (because the result would be 1,000,000
times to large, instead of 1,000). On a computer, multiplying by powers of 2 is
more efficient than multiplying by powers of 10, so you don't multiply by 1,000
but for example by 1024 (2^10). In that case, you are working with 22.10 fixed
point math; you use 10 bits for the fractional part, and 22 for the whole part.
So, after a multiply of two of those numbers, you shift the result 10 bits to
the right to get the desired 22.10 fixed point answer.]

The problem with fixed point math is of course keeping the precision high
enough, while not running out of bits. Multiplying two numbers illustrates this
problem: 22.10 times 22.10 results in an intermediate result that is 44.20,
that's 64bit. If you don't have 64bits, you loose something: The highest 32 bits
of the whole part. So, the temporary value is in fact 12.20 fixed point, and in
such an intermediate value you can only fit the result of a multiply of two 6.10
fixed point values. That means that there's little space for your whole numbers,
even if you use only 10 bits for the fractional part. Fixed point math is thus
continuesly juggling of bits (and I haven't even mentioned negative numbers).

So, now I have my 3D engine. It's extremely stable, sub-pixel and sub-texel
accurate, wich basically means that there are no shaky pixels anywhere. So far
it's just a spinning cube, but I will add a nice 3DS loader tonight, and phong shading tomorrow, so I will have
my spinning iPaq on an iPaq on an iPaq before the weekend. And all this with
integer math only.

[Sub-pixel and sub-texel accuracy: Not all lines starting at (1,1) and going to
(100,2) look the same. Actually, they do if you always pass your line code
integer coordinates, but if you pass floating point coordinates, it becomes
slightly more complicated. The example line goes one pixel down, and a hundred
pixels to the right. So, where's the scanline change? Visually, it does matter
if the 'step' is at x=10 or x=90. You can make the line appear to move smoothly
if you take this into account. So, a line that starts at (1,1.9) and ends at
(100,2.0) will make the step almost immediately, while a line that starts at
(1.0) and ends at (2.9) will make it just before reaching x=100 (note that 2.9
will still be rounded to 2).
The same goes for texture U and V.
When you rasterize a polygon, you would normally interpolate an x position from
the first scanline to the last scanline that each edge covers. A scanline has an
integer y position, so if you want to be more accurate, you take the difference
between the integer y position and the real y position into account. The x
position of the edge on the first scanline is thus corrected as follows:

float diff = (fy - iy);
x += diff * deltax;

where fy is the floating point y coordinate of the first scanline, iy the
integer coordinate of the first scanline, and deltax is the value that you add
to x when you go to the next scanline.

The same happens when you start drawing the actual spans; the left side of the
span is stored as a floating point value, while you start drawing on an integer
position. So:

float diff = (fx - ix);

u += diff * deltau;
v += diff * deltav;

where deltau and deltav are the values that you add to u and v when you proceed
to the next pixel.]

Of course, there are easier ways to solve the accuracy problems, even when using
fixed point numbers. You could use a 64bit intermediate value. There are nice
data types in C and C++ that support this. I didn't want to use this right away
though, because I have no idea how this would get compiled (I don't want to
replace the floating point emulator by a 64bit emulator), and I don't want to
use assembly. Assembly would have solved the whole problem, since virtually
every 32 bit processor supports 64 bit intermediate results, just for the
purpose of fixed point math; but sadly not every WinCE device has the same
processor, wich means that I would need to write assembly for the Hitachi SH3
and SH4 processor, the MIPS and Intel's ARM and StrongARM.
That's not platform independent, and I don't like that.

So I'm happy with the current results. I'm going to stress-test it as soon as
possible.

You can download the 3D engine here, the zip file
contains project files for the Embedded Visual Tools and Visual C++. I've also
included an executable, in case you're really lazy. :)

Final Words

That's it for now. I hope that I have made clear what's so interesting about
PDA's: Old skool coding, made neccessary by lots of contraints. The WinCE
platform can theoretically be coded more or less like a Win32 machine, but if
you want to write good code for it, you need to write radically different code.
And if you do, WinCE PDA's (especially the iPaqs) are awesome machines.