Improving a sprite-based rendering procedure

So you are developing a 2D game and, suddenly, you discover that the
rendering procedure is slow. What would you do? Would you put less elements in
your levels or would you render them more efficiently? Since you are here, I
suppose that you would pick the latter. So, here we go, I am going to tell you
how to render less things while having more.

The first part of this post describes how much work is done during the
rendering of a scene. Then I will compare it with the results of an easy to
implement improved procedure. And finally I will give you the pointers to an
even better procedure. This final procedure has been used in Andy's Super Great Park
and in the —work in progress— Plee the Bear.

Background

In the first versions of Plee the Bear we were not really worried about the
speed of the rendering procedure, nor the speed of any other procedure. Keeping
in mind that premature optimization is the root of all evil, we had to make things
work before making them working fast. That was some years ago. Then the game
has grown, we began to put a lot of stuff in the levels and finally the time of
thinking about accelerating things did come. That is the subject of today's
post: how the rendering procedure evolved with the growing of the game.

The initial procedure was as simple as possible. Elements are rendered from
the background to the foreground, as is. Having something drawn on the screen
was a sufficient result at this time.

So, what amount of work this procedure does? Let's see how many times each
pixel of the screen is written in a given scene. We will use the very beginning
of the first act of the forest of Plee the Bear, just when the player can start
to control Plee:

Not surprisingly, with three layers of rain plus the background, each pixel
is written at least 4 times, most of them 5 or 6 times and some are written up
to 9 times. And once the rain is gone, the range goes from 1 to 6 writings:

An interesting thing in these two pictures is that even parts hidden by the
middle ground decorations are rendered.

Improving the rendering procedure

The improvement we wanted to introduce then was to avoid rendering elements
that will be hidden by other elements. The idea is to maintain a representation
of the empty parts of the screen whilst considering the elements from the
foreground toward the background. For each element there are to steps. First,
if the element intersects the empty parts of the screen, we split it into
sub-elements that will cover only the empty parts of the screen. Then, if the
initial element is opaque, we update the emptiness of the screen.

To keep things simple, we represent the parts of the screen with
axis-aligned boxes. Elements are considered as opaque if there is no alpha
transparency in the source image and if they are not rotated.

Let's come back to the game to see how many times the pixels are written
with this procedure. During the rain:

Pixels are written from 2 to 8 times. Contrary to the original procedure,
some of them are drawn 2 or 3 times. The number of pixels drawn more than 3
times has been greatly reduced. And after the rain:

Here the range becomes 1-5 writings per pixels, most of them are written 1
or 2 times. Contrary to the original algorithm, we have more pixels written
once than three times.

The benchmark

Finally, for all this work to be useful there must be an increase of the
performance. That is: more frames rendered per second. To keep an uniform
sequence of rendered items among the tests, we use a demo script that runs in
the game. Here are the results:

One can see that the new procedure greatly increases the number of frames
per seconds, which is exactly what we wanted.

Can we have more?

Yes! we can do better. You may have noticed on the above captures that some
parts of the screen seem to be written several times even if the foreground
seems opaque. The main reason is that these foreground sprites have some
transparent pixels on one of the edges of their box. Thus, the procedure does
not consider any opaque box for them.

Contrary to the previous procedure this one cannot be executed at run time
(unless you accept the levels to be be loaded in several minutes). For our
games, we managed to insert the procedure in the level editor, as an
optimization step executed when the level is compiled. Then the game engine
just have to read the computed opaque boxes and to apply them in the initial
procedure.

- collect the sprite drawing command (position, rotation, scale, color,
texture..).
- handle z-order to be able to use transparency, have correct
foreground/background...
- minize the number of drawing per pixel
- minize the number of texture binding.
- send gpu drawing command in correct sized batches.

The main problem is that all of this optimization are not compatible with
each other and the best trading between them is GPU dependant...

To simplifiy, the number of needed textures can be reduced at packaging time
using a bin packer algorithm on sprite animations. This is why I started the
nanim projet : http://devnewton.bci.im/projects/na...