Recently it came to be that I was hired for contract work on a port of a
Playstation game to the PC (which shall remain nameless). At the time, it
sounded like a fairly simple task. Todays computers are, after all, much faster
than the Playstation, right? Yeah, well, this is true for any computer bought
this year, but due to some bizarre twist of fate that wasn't to be our target
platform. It seems the company I'm working for is a huge fan of Deer
Hunter--not because they have ever *played* the game, but because of its
tremendous success. If you look on the back of the box, I believe you'll see
the minimum requirements for that game: P133, 16MB ram. Note, this doesn't
include a 3d accelerator, because, well, how many people buying games at Walmart
would have one? So, of course, if Deer Hunter has those requirements, so must
the Playstation port as well. Perfectly sound reasoning, right? Well, I guess
it is from a management point of view.

Well, also due to a sore lack of reasoning on my part I took the job (along with
a few friends). < STUPIDITY >I mean, a P133 has got to be faster than the
Playstation, right? And D3D software mode can't be all that slow,
right? < /STUPIDITY > As you may have guessed, the only option was to implement a
software rasterizer of our own. For speed concerns we decided to use 8-bit
paletted mode for the renderer. We need every little ounce of speed we can get,
and none of us are by any means another Carmack or Abrash. This raised another
issue, however. The Playstation supports both translucency and Gouraud shading,
which we'd need to have in our port as well. How does one do these things in
8-bit paletted mode? Well, that's the topic of this article. Herein details
the method we used to implement our rasterizer, as well as its limitations and
possible alternatives.

First, you may ask why you'd ever need to implement a software rasterizer, and
an 8-bit one, no less. With the exception of my present situation, I can't
really think of a definite answer to that question. God forbid any of you are
ever in my present situation (as it turns out, the software rasterizer is the
least of our problems... all I'll say is that spaghetti C and PSX assembly
commented in Japanese is not a pretty sight). However, the technique I'll
describe here is very effective in creating unique effects both in 3d and 2d,
allowing for things that simply cannot be done in real-time non-paletted modes.
Also, it never hurts to have lower minimum requirements, as long as the visual
quality stays relatively the same (which is quite possible, if implemented
correctly). A lot of the methods covered in this article you may already know,
but perhaps haven't really found any decent use for up until now. Other things
(hopefully some things) you may not have even thought of as yet, but will find
useful.

Let's start with the basic problem: we want to create an 8-bit game, but not
have it look like it was made back in the 80's (bright, distinct, ugly colors).
That's what we'd get if we went with a standard 332 palette for our game (3 bits
for red, 3 for green, and 2 for blue). A 332 palette has 256 colors (duh), but
most of those colors are spread out into a range of colors that just aren't very
useful and/or don't fit together well (for instance there are 7 shades of pure
red, but only 2 shades of gray [not counting black and white]). This palette
does have one primary advantage, though: all of the blend operations that can
be performed in 16+ bit modes can be performed equally well in this mode (with
notable gradient loss, however). This is definitely the easiest method of
implementing translucency and shading in an 8-bit game, but the detail loss is
prohibitively large. I would not suggest ever using this method but for the
most simple of games. Anyways, these days it is all about visual quality, and
users just won't stand for primitive graphics, even if it does mean reduced
hardware requirements.

The only alternative is to use a palette optimized for the graphics in your
game. Creating an optimized palette is a fairly simple task and is handled by
most graphics manipulation packages (Photoshop, Debabelizer, etc.). I believe
there is also quite a bit of shareware that will allow you to do this (correct
me if I am wrong). With an optimized palette, the game will suddenly look a
whole lot better. Even if the textures vary drastically in colors from one
another, an optimized palette will allow for much better quality than any form
of fixed palette (such as 332). But then, how do we implement translucency and
shading using an optimized palette? Unlike 332 mode, there is no quick
mathematical trick to determine what the blend of two colors will be, so there
is no efficient way to do translucency and shading. Of course, what I just said
is not completely true, otherwise I wouldn't be writing this article.

There are actually two ways (that I know about) of blending two colors together
from an arbitrary optimized palette in an efficient manner (ie. without
searching through the entire palette for the closest blend match during every
color blend). One is to use an ordered optimized palette, and the other is to
use a 2d blend look-up table. The former entails creating the optimized
palette, then sorting the entries in such a manner that allows two color indices
to be combined via some mathematical operation between the two (preferably a
simple one, like an add or an or), the result of which will be another palette
index. This method is, however, quite inaccurate with most palettes (although
completely accurate with some palettes), and is inflexible. Since I decided not
to use this technique, I'll not be covering it here. The latter involves
precomputing a look-up table for all of the possible blends between two color
indices. This method has the benefit that it is 100% flexible (*all* possible
blends between two palette indices can be described by the table), and that it
is fairly fast. It is not perfect, however, since it requires 256x256 (64K)
entries for each blend type, which has the dual effect of being hard on memory
conservation and cache efficiency. Both of these problems will be discussed as
well.

The easy part -- implementation. For every polygon or sprite we wish to draw
translucently, there is a translucency map associated with it. To keep things
simple, the translucency map is allocated as single linear 64K byte array. This
linear array represents a 2d 256x256 array, so in order to index an element at
row r and column c we simply need to shift the most significant index left by 8,
like so: A[(r<<8)+c] (this is assuming that the array is stored such that the
first row is 0-255, second row is 256-511, etc.). This is an useful operation,
so lets just make a macro for it:

#define INDEX_TRANSMAP(T, r, c) (T)[((r)<<8)+(c)]

From this point on, all a translucency blend requires is the following:
*destinationPtr = INDEX_TRANSMAP(transMap, *sourcePtr, *destinationPtr);
If, for example, the translucency map transMap describes an average blend, the
output value of INDEX_TRANSMAP will be the closest match in the palette to that
blend. Although this does not guarantee the optimal output, with most palettes
it will be pretty damn close.
The only thing left is generating the translucency map. This too, is a fairly
simple task, but should not be done at run-time, since generation can take quite
a while (it's O(n^3), where n is likely 256, the size of the palette).
Basically, given a source palette and destination palette (they need not be the
same) for every source/destination index pair, the blend is computed and the
destination palette is searched for the closest match to the output. The only
thing that needs to be specified is how each pair is to be blended. Thankfully,
I already have a decent utility for creating translucency maps that should be
included at the end of this article. It is capable of setting opacity
percentages on a per-color basis, defining custom blend equations, and a preview
window of the resulting blend.
Below I've listed some of the pros and cons of using a translucency map.

Pros

Any blend method possible within the palette can be represented with a blend
look-up table.

The calculation time is constant regardless of what kind of blend is
performed. In non-paletted blends you are limited to blending operations that
can be performed in real-time, like adds, ors, multiplies, etc. Anything more
complex, such as divides and non-linear operations would usually be considered
too expensive. With a translucency map, no such constraint exists.

Source and destination color keying is unnecessary since it can be encoded
directly into the translucency map. This eliminates one test and jump from the
innermost loop of the drawing routine. Depending on how tight the loop is, this
can make a big difference. For instance, in my own blitting code, translucent
blits were 30% to 50% faster than plain (non-translucent) color-keyed blits. Of
course, results may vary...

Simulated alpha channel. If the sprites/textures maintain a sort of
color/translucency value consistency, a translucency map can be created that
allows certain portions of the sprite/texture to be more or less opaque than
other portions. For example, say we had a bitmap of flames rendered against a
black background. We could apply a translucency map to it that forced the
higher-intensity areas of the bitmap to be more opaque than the less intense
areas. Anything completely black would be transparent, while the flames would
solidify near the brighter center portions. Granted, this is not really alpha
channel, but in many cases it looks very nice.

Light maps. Grayscale bitmaps can be used with a translucency map that uses
the source pixel's intensity as an intensity modifier for the destination pixel
(either a lighten or a darken amount).

Gouraud shading for RGB or grayscale vertex colors. Instead of using
INDEX_TRANSMAP(transMap, *sourcePtr, *destinationPtr) for the blend method, you
can use INDEX_TRANSMAP(transMap, *sourcePtr, interpolatedColor), where
interpolatedColor is the color interpolated from the vertex colors during
Gouraud shading. The vertex colors can be represented by any fixed RGB palette
like 332 or by grayscale, and interpolated just like vertex colors in
non-paletted modes. RGB vertex colors sacrifice granularity for color (which
isn't so bad since it's used on top of a texture), while grayscale sacrifices
colored light for smoother gradients (which isn't so bad either, since colored
gouraud shading doesn't look that great in any mode). The vertex colors can
also be used with different shading blend maps to allow for different effects
(such as sharper lights or fades from one color to a different color).

Fog. The depth value of a pixel can be used as an index into a translucency
map that fades to the fog color.

Anti-aliased fonts. Take white fonts anti-aliased against a black
background, then use a translucency map that sets output opacity based on input
intensity. Voila, anti-aliased fonts. Different translucency maps can be used
to change the color and general look of the font.

Cons

Translucency map size. Each translucency map is 64K... That means that
having just 16 of them would take up a meg of memory. Eh, who cares? I'd be
hard pressed to need a whole 16 separate translucency maps, and even if I did,
that's only a small portion of what my game would take up. Memory gets cheaper
all the time.

Cache coherency. This is a tricky one. If you're blasting the cache
already, there really isn't much need to worry about the translucency map
blasting it as well. If you aren't then this could be a problem, but still may
not be. I guess you'd really have to performance analyze it to be sure. At any
rate, on my old P166 this didn't seem to be a problem.

Discrete translucency levels. Each translucency blend type requires a
separate translucency map, and you can only have a finite number of them. I'd
say this would be a problem if you were running in true color, where discrete
levels of color in general aren't too noticable. Since this is in paletted
mode, though, finer levels of translucency don't really help very much, since
there are only so many colors in the palette to go around.

Destination reads. Since a destination read is required for every pixel,
on-hardware surfaces are a no no. Video memory reads are prohibitively slow, so
all of the images/textures, including the backbuffer, must be in system memory.
Of course, this is generally the case for any sort of non-hardware supported
blends, since destination surface reads are required regardless of display mode.

Closing

Ok, now I'm getting tired of writing this... so I hope I've gotten most of the
relevant details across. If I haven't, then feel free to email me at
jd_wild@uclink4.berkeley.edu with any questions. I doubt many will use this
information since everybody seems to concentrate mostly on 3d hardware these
days, but just on the off-chance that someone out there is still trying to make
a good ole 2d game (or even a 3d game with modest hardware requirements), I
thought I'd share this.

Well, it has been brought to my attention that a little more detailed
description of the translucency map implementation might be helpful. This is my
own fault, I should have realized this before. I'm sorry if I over-simplified
the problem, but I'll attempt to remedy that here.

First, I would really suggest downloading article_transmaps.zip, as
experimenting with different translucency maps can give you an idea of what can
be done with them. If you're interested in generating translucency maps
yourself (without this tool), then you are pretty much on your own, because I
will not be covering the topic here any further. However, if you'd really like
to check out how it is done for yourself, just email me with a request for the
utility source code -- I'll be glad to send it your way.

As for actually using translucency maps, there are a number of ways that it can
be done. The basic mechanism is quite simple. Whenever performing any blitting
or rasterizing operation, the main purpose of the routine you're working with is
to write bits (presumably the correct ones) to the destination surface. In a
standard blit, this would merely require copying lengths of scanlines of a
bitmap to a destination area of the same size, like so (assume that all surfaces
being used are 8-bit):

This loop iterates through the entire height of the source image, each time copying the line
of pixels to the destination with memcpy. Note: pitch refers to the actual byte-width of the
surfaces being used, which need not be the same as the blit width (and usually won't be). But
this function is only useful for straight, unaltered blits from one surface to another. It is
thus not very helpful to us. If we wish to alter the bitmap during blit on a per-pixel level, as
we would need to for color-keyed blits, we'll need to handle each pixel individually, like so:

The code above is more complicated than it needs to be due to it being somewhat optimized.
The basic premise is the same as the previous though, except that instead of using memcpy to copy
an enter row all at once, we iterate through each byte (pixel) in the row one at a time. This is
slower than the previous method, since pentium string instructions cannot be used to copy the
entire line at once, but we have to do this if we wish to analyze the surface on a per-pixel
basis. For color-keying we do just that. Before it is written, each pixel is first checked to
see if it is the same as the color key. If it is, we skip it.

Now to the point. If we wish to do translucency map blends all we need change is: one, remove
the color-key check because it will now be redundant, and two, replace the copy "*dstPtr = *srcPtr;" with
"*dstPtr = INDEX_TRANSMAP(transMap, *srcPtr, *dstPtr);". The resulting code is:

Thus, each destination pixel will now be set to the blend (as defined by
transMap) of the source
pixel with the destination pixel. Depending on what you've defined transMap to
be, this single blit
function might be used for average blits, color-keyed blits, additive blits,
whatever. The blit function now requires a couple extra reads (one from the
destination surface, and another from the translucency map), a shift and two
adds. This speed loss is mostly (if not completely) offset by the speed gained
from removing the conditional (which, in tight loops, can tend to muck up the
pipeline). And if you're still worried about speed, you can unroll the loop to
make it a bit faster still (which from my tests gains about 15% to 20% in
performance).

Now, how do I apply this to my 3d game? Simple. In the polygon rasterizer
there should be a line very similar to the copy in the color-keyed blit,
"*dstPtr = *srcPtr;". To get the same effects available with translucent sprite
blits, all we need to do is replace that line the same way we replaced the that
line in the blit routine. dstPtr will remain the same (it is just a pointer to
a byte in the destination surface), but now srcPtr may refer to a location in
texture memory (or, in the case of non-texture polygons, can be merely replaced
by a color). Now each pixel drawn for the polygon will be blended with the
background as defined by the translucency map being used.

For effects other than translucency, we just alter the parameters being
blended. For example, if we want to implement gouraud shading, we'd use
INDEX_TRANSMAP(transMap, *srcPtr, interpolatedColor), where interpolatedColor is
the color (either grayscale or RGB, scaled to the range 0 to 255) interpolated
from the vertex colors during Gouraud shading. You may have noticed, however,
that this method has one limitation: you cannot have a translucent Gouraud
shaded polygon. You can have one or the other, but not both. But, if we're
willing to sacrifice a little more speed, it is quite possible to accumulate
translucency map blends, like so: "INDEX_TRANSMAP(transMap1,
INDEX_TRANSMAP(transMap2, srcColor, blendColor), dstColor);". This has the
affect of first blending srcColor with blendColor (which could be the Gouraud
texture/interpolated color blend), then feeding that output to another blend
with dstColor (which could be the translucency blend). Thus, for a little speed
loss, we can do two blends at once. Of course, there is no reason why more than
two couldn't be done at a time.

I hope this helps explain a little more clearly the ideas I brought up in my
article. The other techniques I mentioned (such as anti-aliased fonts,
lightmaps, and simulated alpha) are all implemented with the standard
source/destination blend detailed. Of course, there are many other things that
can be done with translucency maps, so I encourage those who are interested to
experiment. As before, if you have any questions or comments, email me:
jd_wild@uclink4.berkeley.edu.