5.
The Bitmap modes (mode 3, 4, 5)

5.1.
Introduction

In this chapter, we'll look at the bitmap modes. Bitmap modes are a
good place to start because there is a one to one relation between
the contents of memory and the pixels on the screen. The essentials
of all the bitmap modes will be discussed briefly, with a closer look
at what you can do in mode 3 as an example. We'll also see a bit of
page flipping (mode 4), which allows for smoother animation.

The chapter will close with a section on how to deal with data and
computer memory in general. Because GBA programming is very close to
the hardware, you need to know these things. If you've been
programming (in C or assembly) for a long time and have already gained
a good understanding on data, datatypes and memory you can probably
skip it; for the rest of you, I would strongly urge to read it,
because it is very important for all the chapters to come.

5.1.1.
Bitmap 101

Fig 5.1: Link (24x24 bitmap).

In fig 5.1 you can find a bitmap of one of
the game characters that made Nintendo great. This is probably how
most people think of bitmaps: a grid of colored pixels. In order to
use bitmaps in a program we need to know how they're arranged in
memory. For that we use fig 5.2 (below);
this is a zoomed out version of fig 5.1,
with a pixel grid imposed over it and some numbers.

A bitmap is little more than a w×h matrix of colors
(or color-indices), where w is the number of columns (the width)
and h the number of rows (the height).
A particular pixel can be referred to with a coordinate pair:
(x, y). By the way, the y-axis of the GBA points
down, not up. So pixel (0, 0) is in the top-left
corner. In memory, the lines of the bitmap are laid out sequentially,
so that the following rule holds: in a w×h bitmap,
the pixel (x, y) is the
(w×y + x)-th pixel. This is true for all C
matrices, by the way.

Fig 5.2 shows how this works. This is a
w=24 by h=24 bitmap, at 8bpp (8 Bits
Per Pixel (=1 byte)). The numbers in yellow indicate
the memory locations; you can count them for yourself if you don't
believe me. The first pixel, (0, 0), can be found at location 0.
The last pixel of the first row (23, 0) is at
w−1 (=23 in this case). The first pixel of the second
row (0, 1) is at w (=24) etc, etc, till
the last pixel at w×h−1.

Fig 5.2a: zoom out of
fig 5.1, with pixel offsets.

Fig 5.2b: zoom out of
fig 5.1, with pixel values.
Zero omitted for clarity. Palette on the lefthand side.

Note, however, that when you use another bitdepth, the addresses change
too. For example, at 16bpp (2 bytes per pixel), you'd
need to multiply the pixel-number by 2. Or use another datatype for
your array. The general formula is left as an exercise for the reader.

Usually it's not actually the width (i.e., the number of pixels in a
row) that's important, but the pitch. The pitch is defined
as the number of bytes in a scanline. For 8bpp images the pitch and
width will usually be the same, but for, say, 16bpp images (2 bytes
per pixel) the pitch is the width times two. There's another catch:
memory alignment. Alignment will be covered in
a later section, but the upshot is that
systems generally have a ‘preferred’ type size and can
better deal with data if the addresses are a multiple of that
type size. This is why windows BMPs' scanlines are always aligned to
32bit boundaries.

5.2.
The GBA bitmap modes

Video modes 3, 4 and 5 are the bitmap modes. To use them, put 3, 4
or 5 in the lowest bits of
REG_DISPCNT and
enable BG2. You may wonder why we start with mode 3,
rather than mode 0. The reason for this is that bitmaps are a lot
easier to come to terms with than tilemaps. And this is the
only reason. The truth of the matter is that the bitmap
modes are just too slow to be used for most conventional GBA games.
I can't give an exact figure, but if someone told me 90% or more
of GBA games used tilemodes and not bitmap modes, I wouldn't be
surprised. The only time when bitmap modes would be beneficial
would be either for very static screens (introductory demos) or
very dynamic screens (3D games like Starfox or Doom).

The bitmap modes have the following characteristics:

Table 5.1: Bitmap mode
characteristics

mode

width

height

bpp

size

page-flip

3

240

160

16

1x 12C00h

No

4

240

160

8

2x 9600h

Yes

5

160

128

16

2x A000h

Yes

What width, height and bpp mean should be clear by now; the size
that the bitmap requires is simply width × height
× bpp/8. Page flipping may need some more explanation,
but first we'll look at some examples of mode 3 graphics.

5.2.1.
Drawing primitives in mode 3

We've already seen how to plot pixels, now it's time for some lines
and rectangles. Horizontal lines are piss-easy: because the pixels
are in adjacent memory, all you need is a simple loop from the
starting x to the final x. Vertical lines are
nearly as easy: while the pixels aren't right next to each other,
they do have a fixed offset between them, namely the pitch. So again
a simple loop is all you need. Rectangles are essentially multiple
horizontal lines, so those are easy as well.

Diagonal lines are a little trickier, for a number of reasons.
Diagonal lines have a slope that indicates how many horizontal steps
you need to take before moving to the next scanline. That would only
work if the absolute value were lower than one, otherwise you'd get
gaps between pixels. For higher slopes, you need to increment
vertically, and plot horizontally.

Another point is how to make the
routine fast enough to be of real use. Fortunately, these things
have all been figured out in the past already, so we'll just use the
results here. In this case, we'll use a
Bresenham
Midpoint algorithm for the line drawing, modified to deal with
horizontal and vertical lines separately. While I could explain what
the routine does exactly, it is out of the scope of the chapter,
really.

Two points I have ignored here are normalization and clipping.
Normalization means making sure the routine runs in the
right direction. For example, when implementing a line drawing
routine that runs from x1 to x2 via an
incrementing for loop, you'd best be sure that x2 is
actually higher than x1 in the first place.
Clipping means cutting the primitive down to fit inside
the viewport. While this is a good thing to do, we will omit it
because it can get really hairy to do it well.

The code below is an excerpt from toolbox.c from the
m3_demo and contains functions for drawing lines, rectangles
and frames on a 16bpp canvas, like in mode 3 and mode 5.
dstBase is the base-pointer to the canvas and
dstPitch is the pitch. The rest of the parameters should be
obvious.

These functions are very general: they will work for anything that
has 16bit colors. That said, it may be annoying to have to add the
canvas pointer and pitch all the time, so you could create an
interface layer specifically for mode 3 and mode 5.
The ones for mode 3 would look something like this:

Now, note what I'm doing here: instead of treating VRAM as an
array of 16bit values which are appropriate for 16bpp colors,
I'm using a 32bit pointer and filling VRAM with a 32bit variable
containing two colors.
When filling large chunks of memory, it makes no difference if I fill
it in N 16bit chunks, or ½N 32bit chunks.
However, because you only use half the number of iterations in the
latter case, it's roughly twice as fast. In C, it's perfectly
legal to do something like this and often actually useful.
This is why it's important to know the principles of
data and memory. Also note that I'm using
pointer arithmetic here instead of array indices. While the compiler
generally make the conversion itself, doing it manually is still often
a little faster.

While this method is already twice as fast as the ‘normal’
method, there are actually much faster methods as well. We will meet
these later, when we stop using separate toolkit files
and start using tonclib, the code library for tonc. Tonclib
contains the functions described above (only faster), as well
as 8bpp variations of the bmp16_ routines and interfaces
for mode 4 and mode 5.

Below you can find the main code for m3_demo, which uses
the m3_ functions to draw some items on the screen.
Technically, it's bad form to use this many magic numbers, but
for demonstration purposes it should be okay. The result can be seen
in fig 5.3.

A dash of mode 4

Mode 4 is another bitmap mode. It also has a 240×160
frame-buffer, but instead of 16bpp pixels it uses 8bpp pixels. These
8 bits are a palette index to the background palette
located at 0500:0000. The color you'll see on screen
is the color found in the palette at that location.

Pixels of a bitdepth of 8 mean you can only have 256 colors at a time
(instead of 32678 in the case of 15bpp), but there are benefits as
well. For one, you can manipulate the colors of many pixels by simply
changing the color in the palette. An 8bpp frame-buffer also takes up
half as much memory as a 16bpp buffer. Not only is it faster to fill
(well, in principle anyway), but there is now also room for a second
buffer to allow page flipping. Why that's
useful will be covered in a minute.

There is, however, one major downsize to using mode 4, which stems
from a hardware limitation. With 8bit pixels, it'd make sense to
map VRAM as an array of bytes. This would be fine if it weren't for
the rather annoying fact that VRAM does not allow
byte-writes! Now, because this is a very important point, let
me repeat that: You cannot write to VRAM in
byte-sized chunks!!!. Byte-reads are ok, but writes have
to be done in 16-bit or 32-bit bit chunks. If you do write
in bytes to VRAM, the halfword you're accessing will end up with that
byte in both its bytes: you're setting two pixels at once.
Note that this no-byte-write rule also extends to PALRAM and OAM,
but there it doesn't cause trouble because you won't be using that
as bytes anyway.

So how to plot single-pixels then? Well, you have to read the whole
halfword you're trying to access, mask off the bits you don't want to
overwrite, insert your pixels and then write it back. In code:

As you can see, it's a little more complicated than
m3_plot(). It takes a lot longer to run as well.
Still, once you have a pixel plotter, you can create other rendering
routines with ease. The basic code for drawing lines, rectangles,
circles and the like are pretty much independent of how pixels are
formatted. For example, drawing a rectangle is basically plotting
pixels in a double loop.

This is the generic template for a rectangle drawing routine. As long
as you have a functional pixel plotter, you're in business. However,
business will be very slow in mode 4, because of the complicated
form of the plotter. In all likelihood, it'll be so slow to make it
useless for games. There is a way out, though. The reason
m4_plot() is slow is because you have to take care not to
overwrite the other pixel. However, when you're drawing a horizontal
line (basically the ix loop here), chances are that you'll
have to give that other pixel the same color anyway, so you needn't
bother with read-mask-write stuff except at the edges. The
implementation of this faster (much faster) line algorithm
and subsequently rectangle drawer is left as an exercise for the
reader. Or you can seek out tonc_bmp8.c in tonclib.

VRAM vs byte writes

You cannot write individual bytes into VRAM (or PALRAM or OAM for
that matter). Halfwords or words only, please. If you want to
write single bytes, you have to read the full (half)word, insert
the byte, and put it back.

Every kind of graphics surface needs its own pixel plottet.
In principle, more complicated (multi-pixel) shapes are surface
independent. For example, a line routine follows the same
algorithm, but simply uses a different plotter for drawing pixels.
These generic forms are great in terms of re-usability and
maintainability, but can be disastrous when it comes to
speed. Creating surface-specific renderers may be extra work,
but can on occasion save you up to a factor 100 in speed.

5.2.3.
Complications of bitmap modes

While I could go on to discuss more complicated matters, such as
drawing rectangles and blits and text, there's very little reason to
do so at this junction. As I said before, the bitmap modes are
useful to learn some basic functionality, but for most practical
purposes, you're better off with tiled modes.

The primary issue is speed. Even simple primitives
such as the ones shown here can take a lot of time, especially
if you're not careful in your implementation. For example, a full
mode 3 screen-wipe would take about 60% of a VBlank at best!
In bad implementations of a screen-wipe,
like doing it with a rectangle drawer that calls a non-inline
pixel-plotting function, could take as much as 10 frames. And
then you still have to draw all your backgrounds and
sprites and do the game logic. The phrase ‘crawling horror’
somehow springs to mind at the thought of this.

Aside from that, bitmap modes can use only one background and have no
hardware scrolling to speak of. Also, though this is jumping the
gun a bit, it overlaps the memory that contains the sprite
tiles,
which starts at 0601:0000h. For that reason, you will
only be able to use sprite-tiles 512 to 1023 when in modes 3-5.

Page flipping can alleviate some of these items, but that's not
available in mode 3. It is in mode 5, but that uses only a small
portion of the screen, so gaming with only that looks awkward.
As for mode 4, well, that's one of those places where you will
really see what programming close to the hardware means:
it doesn't allow you to write to VRAM in byte-sized chunks!
The only way to have a single-pixel resolution is to combine 2
adjacent pixels and write those, which costs a lot of extra time.

So basically, use the bitmap modes for testing and/or static images,
but not much else unless you know the tilemodes can't do what you
want.

Bitmap modes are not for gaming

Do not get too comfortable with bitmap modes. They're nice for
gbadev introductory sections because they are easier to work with
than tile modes, they are not suitable for most types of
games because the GBA simply can't push pixels fast enough.
Tinker with them to get a feel for IO registers and the like,
then move on.

5.3.
Page flipping

Fig 5.4: Page flipping procedure.
No data is copied, only the ‘display’ and
‘write’ pointers are swapped.

Page flipping is a technique that eliminates nasty artifacts
like tearing in animation. There are two things going on at the same
time in an animation: placing the pixels on bitmap (writing), and
drawing the bitmap on screen (displaying). Software takes care of
writing, updating the positions of characters etc; hardware does the
displaying: it simply takes the bitmap and copies it to the screen. The
problem is that both these processes take time. What's worse, they
happen at the same time. And when the game state changes in mid draw,
the bottom section will be of the current state, while the top section
will represent the previous state. Needless to say, this is bad.

Enter page flipping. Instead of using one single bitmap to write
and display, you use two. While one bitmap is displayed, you write
all you need onto a second bitmap (the back-buffer). Then, when you're
finished, you tell the hardware to display that second bitmap and you
can prepare the next frame on the first. No artifacts whatsoever.

While the procedure works great, there are some snares. For the
first, consider this. Given are the pointers to the two pages
page1 and page2. Right now,
page1 is displayed and
page2 is being prepared; so far so good. But when you
switch to the second page, this only makes page2 the
display-page; you have to make page1 the write-page
yourself! The solution to this problem is simple: use a write-buffer
pointer, but it can catch you off-guard if you're new to this stuff.

The second problem concerns a little nasty in the age-old method of
animation. The canonical animation does this. Frame1: draw object.
Frame2: erase old object, draw object in new state. This doesn't work
for page flipping since Frame2 is written on an entirely different
bitmap than Frame1, so trying to erase Frame1's old object doesn't.
What you need to erase is the object from 2 frames ago. Again, easy
solution, but you have be aware of the problem. (Of course, erasing
the entire frame each time would work too, but who's got the time?)

Pageflipping, not double buffering

Another method of smoother animation is double buffering: draw on
a secondary buffer (the backbuffer) and copy it to the screen when
finished. This is a fundamentally different technique than page
flipping! Even though both use two buffers, in page flipping you
don't copy the backbuffer to the display buffer, you make
backbuffer the display buffer.

What the GBA does is page flipping, so refer to it as such.

5.3.1.
GBA page flipping

The second page of the GBA is located at location 0600:A000h.
If you look at the size required for mode 3, you'll see why it doesn't
have page-flipping capabilities: there's no room for a second page. To set
the GBA to display the second page, set
REG_DISPCNT{4}.
My page flipping function looks a little like this:

The code is relatively straightforward. vid_page is the
pointer that always points to the write-page. I had to pull a little
casting trickery to get the XOR to work (C doesn't like it
when you try it on pointers). On the GBA, the steps for page flipping
are perfectly xorrable operations. Sure, you could just put the
equivalent in an if-else block, but where's the fun in
that :P?

5.3.2.
Page flipping demo

What follows is the code (sans data) for the pageflip demo.
The actual part concerned with page flipping is very small. In
fact, the actual flip is merely a call to vid_flip()
once every 60 frames = 1 second (point 3). We'll also have to set the
video mode to something that actually has pages to flip, which in this
case is mode 4.

What we'll have to do as well is load the data that will be displayed
on these two pages. I'm using the standard C routine
memcpy() for the copy, because that's the standard way of
copying things in C. While it's faster than manual loops, it does come
with a few snares that you need to
be aware of before using it everywhere. Tonclib comes with faster
and safer routines, but we'll get to those when it's time.

Loading a bitmap is very simple in theory, but the bitmap(s)
I'm using are only 144x16 in size, while the VRAM page's pitch is 240
pixels wide. This means that we'll have to copy each scanline
separately, which is done at point (1). Note that I'm copying
frontBitmap to vid_mem_front and
backBitmap to vid_mem_back, because those
are the starting locations of the two pages.

Since these are mode 4 bitmaps, they'll also need a palette. Both
palettes use frontPal, but instead of using
memcpy() to copy it to the background palette memory,
I'm using a u32-array because … well, just because I guess.

5.4.
On data and how to use it

This section is a little boring (ok, very boring) but it needs to
be said. While books and tutorials on C may use data for whatever
purpose, they often gloss over what data actually is at the
lowest level, and how to deal with it correctly. As you'll be dealing
directly with hardware and memory here, it is important that you
are aware of these items, preferably even understand them, so that
they don't bite you in the ass at some later point.

The first two subsections are about how to get graphics into your
game, something that you'll really need to know. After that I'll
discuss a few nasty and highly technical things that may or may not
cause problems later on. These are optional and you can
skip to the data-loading/interpreting demo
at any time. That said, I urge you to read them anyway because they
may save you a lot of debugging time.

Relax, it's only 1s and 0s

When you get right down to it, everything on computers is merely a
big mess of bits without any purpose by itself. It is the
interaction between hardware and software that makes sequences of
bits appear as valid executable code, a bitmap, music or whatever.

5.4.1.
Yes, we have no files

This may be a good point to say a few words on data.
Strictly speaking, everything is data, but in this case I'm
referring to data that on PC games would be separate
from the executable: graphics, music, maybe scripts and text-files
and what not. This all works fine on a PC, but not so fine on the
GBA because there is no file system. This means that you
cannot use the standard file I/O routines (fscanf(),
fread(), etc) to read the data, because there are no
files to read them from.

All the game's data has to be added directly to the binary. There are
a number of ways to do this. The most common way is to convert the
raw binary files to C-arrays, then compile those and link them to the
project. Well, the most common among homebrewers is probably
converting to C-arrays and #including them, but that's something
that you should never do. Also popular are assembly arrays.
These are a useful alternative to C arrays because a) they
can't be #included and b) because they bypass the compilation
step and compilation of arrays is very intensive. Of course, you would
have to know how to work with the assembler. Another nice thing about
the assembler is that you can include binary files directly into them,
eliminating the need for a converter. Lastly, while the GBA doesn't
have a native file system, you can always write your own. A common
one is GBFS by
the gbadev forum FAQ maintainer, tepples. Using a file system is
actually the recommended method, but for now, I'll stick to
C-arrays because they are the easiest to use.

Ahem. Actually, we do have files

There were no files in the past, but in July of 2006,
Chishm gave us
libfat, which is a FAT-like file system for GBA and NDS.
It is distributed via DKP as well, so chances are you have it already.

Where do my arrays go?

By default, arrays go into IWRAM. You know, the one that's only 32 kb
long. Now, a mode 3 bitmap is 240x160x2 = 77 kb. Obviously,
trying to put a 77 kb object into a 32kb section would fit nicely
into the bad things category. To avoid this, put it in the read-only
section (ROM), which is much larger. All you have to do for this is
add the ‘const’ keyword to the definition
if you're using C, or the ‘.rodata’
directive in assembly. Note that for multiboot programs ROM actually
means EWRAM, which is only 256 kb long. The latter would fit three
mode 3 bitmaps; more would again be bad unless you use compression.

Note that what I said about arrays is true for all arrays,
not just data arrays: if you want any kind of large array (like a
backbuffer for mode 3), it would also default to and kill
IWRAM. But you can't make it const because then you'd not be able to
write on it. GCC has attributes that lets you choose where things are
put – in EWRAM for instance. Here are the commonly seen #defines
for the attributes that can be used for specific section placement.

Data that you don't expect to change in your game should be defined
as constant data using the ‘const’ keyword,
lest it trashes your IWRAM.

Converted and const arrays in C++

There are two little snags that you can trip on if you're using
(converted) data arrays in C++. The first is that tools that generate
the arrays will output C-files, not C++-files. This is not a problem
in itself because those files will be compiled just the same. What
is a problem is that C++ uses something known as
Name mangling
to allow overloading and stuff like that. C doesn't and as a result,
the name that the C++ file looks for isn't the same one as in the
C file and you get undefined references. To fix this, use
`extern "C"' in front or around the declarations of the
stuff in the C files.

Another problem with C++ is that const-arrays are considered static
(local to the file that contains it) unless you add an external
declaration to it. So if you just have
`const u8 foo[]= { etc }' in a file, the array will be
invisible to other files.The solution here would be to add the
declaration inside the file itself as well.

5.4.2.
Data conversion

It's rather easy to write a tool that converts a binary file to
a C or asm array. In fact, devkitARM comes with two that do just
that: raw2c.exe and bin2s.exe. It also comes
with the basic tools for gbfs by the way. But being able to attach
binary files to your game is only part of the story. Consider a
bitmap, for example. In principle, a bitmap is a binary file just
like any other. There's nothing inherently graphical about it,
and it doesn't magically appear as a bitmap whenever you use it by
itself. Yes, when you double-click on it, an image viewer may pop
up and display it, but that's only because there's some serious work
by the OS going on underneath. Which we don't have here.

Most files will follow a certain format to tell it what it is,
and how to use it. For bitmaps, that usually means width, height,
bitdepths and a few other fields as well. The point is that they're
not directly usable. You can't just attach, say, a BMP file to your
project and copy it to VRAM and think that everything will work
out. No, you have to convert it to a GBA-usable format.
Now, you can do this internally (on the GBA itself), or externally
(on the PC and attach the converted data to the project). Because
the latter is a much more efficient use of GBA resources, that is
the usual procedure.

There are many conversion tools, one might almost say too many. Some
are one-trick ponies: a single file-type to a single graphics mode
for example. Some are very powerful and can handle multiple file-types,
multiple files, different conversion modes with lots of options on the
side, and compression. It should be obvious which are of the most value.

A good one is gfx2gba.
This is a command-line tool so that it can be used in a makefile, but
there is a GUI front-end for it as well. This tool has the Good Things
I mentioned earlier, plus some map-exporting options and palette
merging, but the input file must be 8bit and I hear that while it does
compress data, the array-size is still given as its uncompressed size
for some unfortunate reason. This tool comes with the HAM
installation, and is quite common, so definitely recommended.
Unfortunately, there seems to be another tool with the same name.
You'll want the v0.13 version by Markus, not the other one.

Personally, I use
Usenti,
but then I pretty much have to because its my own tool. This is actually
a bitmap editor with exporting options thrown in. It allows different
file-types, different bitdepths, different output files, all modes,
some map-exporting stuff, meta-tiling, compression and a few others.
It may not be as powerful as big photo-editing tools as PhotoShop,
Gimp and the like, but it gets the job done. If you're still drawing
your graphics with MS-Paint, please stop that and use this one instead.
The exporter is also available separately in the form of the open source
project called
(win)grit,
which comes in a command-line interface (grit) and a GUI (wingrit).
As of January 2007, it is also part of the devkitPro distribution.

Bitmap conversion via CLI

There are many command-line interfaces available for graphics
conversion, but to make them function you need the correct flags.
Here are examples for gfx2gba and grit, converting a bitmap
foo.bmp to a C array for modes 3, 4 and 5. This is just an
example, because this is not the place for a full discussion on them.
Look in their respective readme's for more details.

Below, you can see a partial listing of modes.c, which
contains the bitmap and the palette used in the bm_modes demo
discussed at the end of this section,
as exported by Usenti. It is only a very small part of the file because
at over 2700 lines it is way too long to display here, which wouldn't
serve much of a purpose anyway. Note that both are u32-arrays, rather
than the u8 or u16-arrays you might encounter elsewhere. What you need
to remember is that it doesn't matter in what kind of an array
you put the data: in memory it'll come out the same anyway.

Well, that's not quite true. Only with u32-arrays is proper
data alignment guaranteed, which is
a good thing. More importantly, you have to be careful with the
byte-order of multi-byte types. This is called the endianness of types. In
a little endian scheme, least significant bytes will go
first and in a big endian, most significant bytes will go
first. See table 2 for an example using 0x01,
0x02, 0x03 and 0x04. The GBA
is a little endian machine, so the first word of the
modesBitmap array, 0x7FE003E0 is the
halfwords 0x03E0 (green) followed by 0x7FE0
(cyan). If you want more examples of this, open up VBA's memory
viewer and play around with the 8-bit, 16-bit and 32-bit settings.

The key point here: the data itself doesn't change when you use
different data-types for the arrays, only the way you represent
it does. That was also the point of the bm_modes demo:
it's the same data in VRAM all the time; it's just used in a
different way.

Those 2700 lines represent a 77kb bitmap. One single bitmap. In
all likelihood, you'll need at least a couple of them to make anything
worthwhile. Most games have lots of data in them, not only graphics
but maps and sound and music as well. All this adds up to a huge amount
of data, certainly too much for just EWRAM and maybe even for a full
cart. That is why compression is also important. The
GBA BIOS has decompression routines for
bit-packing, run-length encoding, LZ77 and Huffman. Converters
sometimes have the appropriate compressors for these routines,
which can drastically shrink the amount of memory used. Usenti and
(win)grit support these compressors. So does gfx2gba, which even has
some more. A tool that just does compression on binary files (but does
it very well) is GBACrusher.
I won't go into compression that much (or at all), but you can read up
on the subject
here.

Understanding data

It is vital that you understand what data is, how the different
datatypes work. Preferably endianness and alignment too. Emulators and
hex editors can help you with this. Once you have compilation working,
just make a few random arrays and see what they look like in the VBA
memory viewer for a while.

5.4.3.
#including code or data considered harmful

Fig 5.6: even Bart knows …

Most non-trivial projects will have multiple files with code and data.
The standard way of dealing with these is to compile these separately
and then link the results to the final binary. This is the recommended
strategy.
However, most other tutorials and many of the example code you can find
on the web do something else: they #include everything into the main
source file and compile that. This is not a recommended practice
and should be avoided.

”But why not? It seems to work fine, and it's so easy!”

Yes, it is easy; and it does seem to work. The main problem is that
it isn't scalable.
For small projects (a handful of files) you probably won't notice,
but as projects grow to hundreds and perhaps thousands of files,
you will run into some very annoying problems.
The main issue is what #include actually does. It copies the whole
included file into the includer to form a single larger file. This
leads to the following issues.

Massive files to compile. So, #including creates one big file.
If you have a lot of stuff, you'll have one very big
file. This will cost large amounts of memory and slows down
compilation. As the project grows, what starts as a compile time of
a second can grow to several, then minutes and perhaps even hours.

At some point, there was also the problem that the compiler couldn't
handle files exceeding 4MB, putting a limit on how much you
could #include in a C file. I'm not sure if this is still an issue.

Recompiling the world. The main problem is that when you #include
everything, you need to recompile everything as well. If you make one
change anywhere, no matter how small, causes everything
to be compiled. For small projects (say, a handful of files), a full
rebuild would take a few seconds so it's not a problem. But larger
projects can have hundreds or thousands of files, and the time is not
measured in seconds, but in minutes or perhaps hours.
Sure it's a good excuse to go sword
fighting, but terribly annoying if you want to do something
productive.

Bloat. Even if your own code and data are relatively small in
number, you're probably using some code library for API functions.
Normally, these are pre-compiled and only the functions used are
linked into your binary. But if those worked by #include as well
(in other words, if their creators had followed the practice I'm
warning against), every function in that library would be included
as well, including the ones you're not using. This increases the filesize,
and increases the problems mentioned above.

Undeclared identifiers, multiple definitions and circular
dependencies. In a nutshell, C requires that you declare an
identifier before it's referenced, and it can only be defined once.
The first point means that the order of inclusions starts to matter:
if, say, fileB.c needs something from fileA.c, the
latter needs to be included before the former to get a compile. The
second means that you could only #include a file once in the whole
project: if fileB.c and fileC.c both need stuff
from fileA.c, you can't #include it in them both because
when they're #included in main.c, fileA.c is
effectively #included twice and the compiler will balk.

These points can technically be overcome by being careful. But,
again, when projects grow, things can get increasingly more
difficult to keep track of which comes before what and why. There
is, however, one point at which it will go wrong, namely
when there are circular dependencies: fileB.c needs
fileA.c and vice versa. Each file would require the
other to go first, which simply isn't possible because it'd cause
multiple definitions.

Data alignment. I'll get to what this means in a minute, but
right now know that copy routines work better if the data is
aligned to 32-bit boundaries (even for byte and halfword arrays).
Some of them won't even work properly if this isn't the case.
This is usually guaranteed if you compile separately, but if the
arrays are #included and no steps have been taken to force alignment,
you simply never know.

It's not much of a problem nowadays because most graphics converters
force data-alignment, but you still need to know about it. Because
data alignment is a fairly esotheric concept, it's next to impossible
to track down unless you're aware of the problems it can bring.

So please, do yourself a favor and do not #include every file you have
into main.c or its moral equivalent. Put function and variable
definitions in separate source files to be compiled separately and
linked later. The #include directive is only to be used for files
with preprocessor directives and declaractions and type definitions.
Oh, and inline functions. Those are okay there too.

5.4.4.
Proper build procedure

Separate compilation

So what do you do instead? Well, for starters keep all the code
and data in separate source files. Compile these separately by
invoking gcc on each file. This gives you a list of object files.
These you then link together. In batch files, you'd need to
add extra commands for each file, but a properly setup makefile
uses a list of object files, and the makefile's rules will then
take care of the rest automatically. Using the makefile of the
second demo as a reference,
you'll get something like this:

The OBJS variable contains the names of three object
files, which would be the targets of compiling foo.c,
bar.c and boo.c. Remember, makefiles list rules by
target, not by prerequisite. The compilation step uses a static
pattern rule, which for each ‘.o’ file in
OBJS compiles the ‘.c’ file with the same
title. This is what runs the compiler for our three source files.
In the linking step the automatic variable $^ expands
to the prerequisites of the rule, which is the list of all object
files, and this is how the files are all linked together. If you
need more files, add them to the OBJS list.

Note that the devkitPro and tonc template files take care of these
things automatically. Just put the source files into the right directory
and you're good to go.

Symbols, declarations and definitions

If you have been doing everything via #include, you should consider
refactoring all of your stuff to separate source files. No, let me
rephrase that, you need to do this because you'll benefit
from it in the end. If you're already well in your project, this is
going to suck because it's boring and time consuming and most likely
it's not even going to work properly when you try the first
build afterwards. I expect you'll get a whole slew of errors,
particularly these three:

`foo' undeclared

redefinition of `foo'

multiple definition of `foo'

To understand what these mean, you need to know a little bit more about
how C (and indeed programs) actually works.

As I said before, there aren't really things like programs, bitmaps,
sound on computers; it's all just bits. Bits, bits and more bits. What
makes a sequence of bits work as a program is the way it is fed to
the CPU, VRAM, and other sections. Somewhere in the build process,
there has to be a translation of all the C code to data and
machine instructions. This, of course, is the compiler's job.

But wait, there's more. C allows you to compile each file separately,
and then link them later into the actual program. This is a good
idea, because it allows you to save time by only compiling the
files that you have recently modified, as well as the use of code
libraries, which are little more than a bunch of
precompiled source files. If you're not convinced that this is a
good idea, consider what it would take without it. You'd have to have
all the source code that you wanted to use (including
for things like printf() and all the API code), and
compile all those megabytes of source files each time. Sounds like
fun? No, I didn't think so either.

However, you need a little more bookkeeping to make this all work.
Because everything is just bits, you'd need a way to find out
where the function or data you want to use actually is. The
contents of the compiled files (the object files) isn't just raw
binary, it contains symbols. This is just a word for the
group of things that have actual binary information attached to
them. Among other things, the object file keeps track of
the symbol's name, section, size, and where its content
is in the object file. A function is a symbol, because it contains
instructions. A variable is also a symbol, as is data for bitmaps,
sound, maps et cetera. Preprocessor #defines, typedefs and
struct/class declarations are not symbols, because they only
don't have actual content in them, but allow you to structure your
code better.

The other bookkeeping note is that each source/object file is a
separate entity. In principle, it knows nothing about the outside
world. This makes sense because it limits the dependency on other
files, but it does create a little problem when you want to make
files work together. This is where declarations come in.

You may have noticed that C is pretty strict when it comes to
names of stuff. Before you can use anything, it requires you to
mention what it is beforehand. For example, if you use a function
foo() in your code and you never defined its code,
or even if you put it after the call to foo(), the
compiler will complain that it doesn't know what you're talking
about. That is, it will say that ‘`foo' is undeclared’.
You have to admit it has a right to stop there: how would it know
how to use the thing if you never told it what it was?

The code snippet below gives an example of when a reference is and
is not declared, and why it's important to have a declaration.
Function a() calls foo(), which is not
known at the time, so an error is produced. Function
b() also calls foo(), which is
known at that time, but still gives an error because
foo() just happens to require an integer as an argument.
If the declaration wasn't mandatory and the call in
a() was allowed, foo() would have been
processing the wrong kind of information at runtime. There are ways
around such problems, of course, languages like PHP, VB and others
work fine without mandatory declarations, but the cost for that is
speed and possibly a lot more runtime errors.

//# C requires identifiers to be declared or defined before first use.// ERROR: `foo' is undefined.void a()
{
foo();
}
// Definition of foo(). Now the system 'knows' what foo is.void foo(int x)
{
// code
}
// foo is known and used correctly: no errors.void b()
{
foo(42);
}
// foo is known but used incorrectly. Compiler issues error.void c()
{
foo();
}

Now back to our separate files, and the difference between declarations
and definitions of symbols. A definition is something of
actual content: it is what actually forms the symbol. Examples are the
value(s) in variables, and the code in functions. A
declaration is just an empty reference. It just says that
there is something in the project with a particular name, and
indicates how that something is supposed to be used: whether it's a
function or variable, what datatype, which arguments, that sort of
things. This is how you can use symbols from other object files.

You should be familiar with what a definition looks like. A declaration
looks very similar. The basic variable declaration is the variable name
and attributes (type, const, section) preceded by extern.
For functions, replace the code block by a semi-colon.
You can also add extern there, but it's not required.

Now, a definition is also a declaration, but this does not work
the other way. How can it, the declaration is supposed to be empty.
The distinction is subtle, but it's the reason you might get multiple
definition errors when linking the files together. Think of what would
happen if you have the definition of function foo()
in multiple files. Each file itself would know what foo()
is because definitions are also declarations, so it would pass the
compilation stage. So now you have multiple object files, each
containing a symbol called foo. But then you try to
link them into one file. The linker sees different versions of
foo, and stops because it doesn't know which one you are
actually trying to use. The moral here is that you can have as many
declarations as you want, but there can be only one definition.

Another point I should raise is that the declaration
defines how a symbol is to be dealt with, as it is the only point of
reference if the definition is in another file. This means that, in
theory, you could have a variable var defined as an int,
but declared as a short, or even a function! While not exactly
recommended, but it is an interesting item.

Lastly: the distinction of what should go in source files, and
what in headers. Source files can actually contain anything, so that's
an easy one. Remember that they will contain everything after the
preprocessor step anyway, because that's what #include really does.
So what matters is what you put in headers. The purpose of header
files is to have a place for all the non-symbol stuff that you
want to use in different source files. That means declarations,
#defines, macros, typedefs, struct/class descriptions.
It also means (static) inline functions, because these don't form
symbols either, but are integrated in the functions that call them.

Summary

All this stuff about separate compilation, declarations, and
definitions is rather important for C programming, but the preceding
text may be a little much to take in at once. So here is a summary of
the most important points.

Symbols. Symbols are those parts of the code that form actual
binary content in the final program. This includes functions,
variables, data, but not preprocessor or type description stuff.

Declarations/definitions. A definition of a symbol is where
the actual content is. A declaration just says that something of
a certain name exists, but will be added to the project later.
Multiple (identical) declarations may exist, but there can be only
one definition in the project. Definitions are also declarations.

Source/object files are selfcontained entities. They contain
the definitions of the symbols that are in the code, and a list of
references to outside symbols, as indicated by the declarations.

Header files contain meta-data, not symbols. Header files
cannot be compiled, but are intended contain the ‘glue’
that allow difference sources to work together (i.e., declarations)
and stuff that makes writing the sources easier (like #defines and
macros). They are meant to be included in multiple files, so they
cannot create symbols because that would lead to multiple
definitions.

Potential problems during compilation or linking:

`foo' undeclared. Compiler error. The identifier `foo' is not
known at this point. Check the spelling, or add the appropriate
declaration or header file containing the declaration.

redefinition of `foo'. Compiler error. The identifier as a
previous declaration or definition conflicting with the current one
in the same file or included headers. Usually accompanied by a message
of the previous definition.

multiple definition of 'foo'. Linker error. The symbol
name `foo' is shared by multiple object files. Replace all but one
definitions of `foo' in the source files with the appropriate
declarations. Usually accompanied with a message indicating the
object file with the other definition(s).

5.4.5.
Data alignment

Data alignment is about the ‘natural’ memory addresses of
variables. It is often beneficial to have a variable of a certain
length to start at an address divisible by that length. For example,
a 32-bit variable likes to be put at addresses that are a multiple of
4. Processors themselves also have certain preferred alignments.
Addressing will work faster if you stick to their native types and
alignment (say, 32-bit everything for 32-bit CPUs). For PCs it is not
required to do any of this, it'll just run slower. For RISC systems,
however, things must be aligned properly or data gets mangled.

In most cases, the compiler will align things for you. It will put
all halfwords on even boundaries and words on quad-byte boundaries.
As long as you stick to the normal programming rules, you can remain
completely oblivious to this alignment stuff. Except that you
won't always stick to the rules. In fact, C is a language that
allows you to break the rules whenever you feel like it. It trusts you
to know what you're doing. Whether that trust is always justified is
another matter :P

The best example of breaking the rules is pointer casting. For example,
most graphics converters will output the data as u16-arrays, so you
can copy it to VRAM with a simple for-loop. You can speed up copying
by roughly 160% if you copy by words (32-bit) rather than
halfwords (16-bit). Run the txt_se2
demo and see for yourself. All you have to do for this is one or two
pointer casts, as shown here.

Both these routines copy fooSize bytes from
fooData to VRAM. Only the second version is much faster
because there are half as many loop iterations and also because the
ARM CPU is just better at dealing with 32-bit chunks. The only danger
here is that while fooData will be halfword aligned,
it need not be word aligned, which is a requirement for the
second version. For those readers that think casts like this and
mis-alignment only happen to other people, think again: the
faster copy routines (memcpy(), CpuFastSet(),
and DMA too) cast to word-pointers implicitly. Use them (and you
should) and you run the risk of misalignment.

There are many ways of ensuring proper alignment. The easiest way is
to not mix converted data with the rest of your stuff. That is, don't
#include data-files. This should suffice. Another method is to
convert to u32-arrays in the first place. In assembly files, you can
control alignment by using the ‘.align n’
directive, where n aligns to 2n bytes. C itself
doesn't allow manual alignment, but there is an extension for this
in GCC: ‘__attribute__(( aligned(4) ))’. Add
that to the definition and it'll be word aligned. This is often
#defined as ALIGN4 in some headers. Files in GBFS are
also always correctly aligned.

Struct alignment

One other area where alignment can cause problems is in struct
definitions. Look at the following code. Here we have a struct
named FOO consisting of one byte, b, one word
w and one halfword h. So that's 1+4+2=7 bytes for the
struct right? Wrong. Because of the alignment requirement, w
doesn't immediately follow b but leaves 3 bytes of padding.
When defining arrays of this type, you'll also see that there
are also two padding bytes after h, because otherwise later
array-entries would run into trouble.

The real size is actually 12 bytes. Not only is this almost
twice the size, if you ever try to copy the array using a hard-coded 7
rather than sizeof(struct FOO), you completely mess it
up. Take this lesson to heart. It's a very easy mistake to make and
difficult to detect after the fact. If you were unaware of this fact
and you've already done some GBA coding, check your structs (or
classes) now; chances are there are gaps that shouldn't be there.
Simply rearranging some of the members should suffice to make them fit
better. Note that this is not specific to the GBA: structs on PCs
may behave the same way, as I noticed when I was writing my TGA
functions.

There are ways of forcing packing, using the
‘__attribute__((packed))’ attribute. If the
FOO struct had that, it really would be 7 bytes long.
The downside of this is that the non-byte members could be
mis-aligned and have to be put together byte for byte (the compiler
does this for you). This is very much slower than the non-packed
version, so only use this attribute if you have no other choice.
What
happens with mis-aligned (half)words then I can't tell you though, but
I'm sure it's not pretty.

Forcing alignment and packing

GCC has two attributes that allow you to force alignment of arrays,
and remove member-alignment in structs.

Devkits and struct alignment

As far as I've been able to tell, structs have always had word
alignment. This was useful because it made copying structs
faster. C allows you to copy structs with a single assignment,
just like the standard data types. Because of the word-alignment,
these copies are fast because GCC will make use of ARM's block-copy
instructions, which are much faster than copying member by member.

However, this does not seem to be true under devkitARM r19 (and
presumably higher) anymore. The new rule seems to be
“structs are aligned to their largest member”. This
does make more sense as a struct of two bytes would actually be
two bytes long. However, it does mean that GCC will now call
memcpy() for non-aligned structs. Apart from it
being a function with quite a bit of overhead (i.e., it's
very slow if you want to copy a single small struct),
it will actually fail to produce correct results in some
cases. The problem is that low-number copies it will copy by the
byte, which is something you cannot do for VRAM, PALRAM or OAM.
For example, objects that we'll see later use a struct of four
halfwords; using a struct-copy there, something I am very fond
of doing, screws up everything. The only way to make it work
properly is to force word-alignment on the struct.

The rules for struct alignment have changed since devkitARM r19.
Instead of being always word-aligned, they are now aligned as
well as their members will allow. If this means they're not
necessarily word-aligned, then they will use
memcpy() for struct-copies, which is slow for
small structs, and may even be wrong (see
next section). If you want to
be able to do struct copies fast and safe, either force
alignment or cast to other datatypes.

5.4.6.
Copying, memcpy() and sizeof

There are many different ways of copying data on this platform. Arrays,
struct-copies, standard copiers like memcpy(), and GBA
specific routines like CpuFastSet() and DMA. All of these
have their own strengths and weaknesses. All of them can be affected by
misalignment and the no-byte-write rule. I discuss some of them in the
txt_se2 demo.

I've chosen to use memcpy() in the early demos for
a number of reasons. The main one is that it is part of the standard
C library, meaning that C programmers should already be familiar with
it. Secondly, it is somewhat optimized (see the
txt_se2 demo for details). However, there are two potential
pitfalls with the routine. The first is data alignment (yes, that
again). If either the source or the destination is not
word-aligned, you're in trouble. Secondly, if the number of bytes is
too small, you're in trouble too.

Both of these have to do with the basic function of
memcpy(), namely to be a fast byte copier. But
as you know, you can't copy single bytes to VRAM directly.
Fortunately, it has an optimised mode that uses an unrolled
word-copy loop if two conditions are satisfied:

When both source and destinations are word aligned.

When you are copying more than 16 bytes.

This is usually the case so I figured it'd be safe enough for the
demos. There are also look-alikes in tonclib that do the same thing
only better, namely memcpy16() and
memcpy32(), but these are in assembly so I thought I
wouldn't lay them on you so soon. Highly recommended for later
though.

On a related subject, there is also memset() for
memory fills. Be careful with that one, because that will only
work with bytes. Tonclib also includes 16- and 32-bit versions of
this routine, but also in assembly.

The last thing I want to discuss is the sizeof()
operator. In other tutorials you will see this being used to find
the size in bytes of arrays, which is then used in
memcpy(). It's a good procedure but will not always
work. First, sizeof() actually gives the size of the
variable, which need not always be the array itself. For
example, if you use it on a pointer to the array, it'll give the
size of the pointer and not of the array. The compiler
never complains, but you might when hardly anything is copied.
Secondly, sizeof() is an operator, not a
function. It is resolved at compile-time, so it needs to be able
to find the size at that time as well. To do this, either
the declaration (in the header) should indicate the size, or
the array definition (in the source file) should be visible.

Bottom line: you can use sizeof(), just pay attention to
what you use it on.

Okay, that was the long and boring –yet necessary–
section on data. Congratulations if you've managed to stay awake till
this point, especially if you've actually understood all of it.
It's okay if you didn't though, in most cases you won't run into the
problems discussed here. But just remember this section for if you do
run into trouble when copying and you can't find it in the code; it
might save you a few hours of debugging.

5.4.7.
Data interpretation demo

The bm_modes is an example of how the same data can result
in different results depending on interpretation (in this case, modes 3,
4 and 5). In the code below, I make one copy into VRAM, and
switch between the modes using Left and Right. The results can be seen
in figs 5.7a-c.

I've arranged the data of the bitmap in such a way that the name of
the current mode can be read clearly, as well as indicated the mode's
boundaries in memory. Because the data intended for the other modes
is still present, but not interpreted as intended, that part of the
bitmap will look a little shitty. And that's partly the point of the
demo: when filling VRAM, you need to know how the GBA will use the
data in it, and make sure it'll be used. If the bitmap ends up being
all garbled, this is the likely suspect; check the bitdepth,
dimensions and format (linear, tiled, compressed, etc) and if
something conflicts, fix it.

Now, sometimes this is not as easy as it sounds. The general
procedure for graphics is to create it on the PC, then use an exporter
tool to convert it to a raw binary format, then copy it to VRAM. If
the exporter has been given the wrong options, or if it can't handle
the image in the first place, you'll get garbage. This can happen
with some of the older tools. In some cases, it's the bitmap editor
that is the culprit. For paletted images, a lot depends on the
exact layout of the palette, and therefore it is vital that
you have a bitmap editor that allows total control over the palette,
and leaves it intact when saving. MS-Paint for example does neither.
Even very expensive photo editing tools don't, so be careful.

For this image, I used <plug>my own bitmap editor
Usenti</plug>,
which not only has some nice palette control options, and tiling
functions, but a built-in GBA graphics exporter as well. To make the
background be the same color in all modes, the two bytes of the
16bit background color of modes 3 and 5 had to serve as palette
entries for mode 4, both using that 16bit color again. In this case,
the color is 0x080F, sort of a brownish color. The bytes
are 8 and 15, so that's the palette entries where the color goes too.
Normally you don't have to worry about switching bitdepths mid-game,
but knowing how to read data like this is a useful debugging skill.

Conclusions

Now we've seen some of the basics of the GBA bitmap modes: the
properties of modes 3, 4 and 5, page flipping, rudimentary drawing
for mode 3 and one of the most important rules of VRAM interactions:
you cannot write to VRAM in bytes. There is much more that can be
said, of course. Bitmap graphics is a rich subject, but going into
more detail right now may not be the best idea. For one, the bitmap
modes are very rarely used in games anyway, but also because there
are other things to talk about as well. Things like button input,
which is what the next chapter is about.

This chapter also discussed a few things about handling data, a very
important topic when you're this close to the hardware. Datatypes
matter, especially when accessing memory through pointers, and you
need to be aware of the differences between them, and the
opportunities and dangers of each. Even if you don't remember every
little detail in the data section, at least remember where to look
when things go screwy.

Before continuing with further chapters, this may be a good time to
do some experimenting with data: try changing the data arrays and see
what happens. Look at the different data interpretations, different
casts, and maybe some intentional errors as well, just to see what
kinds of problems you might face at some point. It's better to make
mistakes early, while programs are still short and simple and you
have less potential problems.

Or not, of course :P. Maybe it's worth waiting a little
longer with that; or at least until we've covered basic input, which
allows for much more interesting things than just passive images.