Pages

Monday, January 14, 2013

So as output is in theory solved by having the DMA send a lines of 16bits halfwords, we need to focus on the storage and memory.
As with horizontal resolution, different choices can lead to
different modes, but I’ll show some generic principles, you can adapt
and create your own mode. Or reuse one.
There are several general ways to store pixels in order to output them.

The first, direct approach is a frame buffer.

A frame buffer is a chunk of memory storing pixels exactly as they will be output, so that you output to the screen
what you’ve got in memory. Sometimes, things can be a little messier
with color planes (i.e. one framebuffer per Rn G and B) and banks.

What resolution can we use if we want to use it with the current memory size ?

We can, of course use Flash for static images. But then, only using static images is quite restrictive for a game console.

Let’s pretend use 2 bytes per pixel. We have 192kb of memory on the
stm32f4, but we can only use 128 as the other 64k is core coupled memory
and won’t be seen by the DMA. So we can store 64k pixels (half that for
16 bits).

For a 4/3 aspect ratio, that’s sqrt(3/4*64*1024) = 221 lines.

We
thus could store one screen of 294x221 pixels, or two screens of (only!) 221x147 if we want
double buffering !

Which begins to
enter the domain of lame for a 32bits game console.

For a better resolution of 640x480 @ 12bpp, we would need 640x480x2 = 614 400 bytes , which is about 5 times
the RAM and more than half the Flash size.

For this, we need to compress the image we want in RAM and decompress it at 25 MHz pixel clock.

So we will need some translation functions that prepares a line of
pixels for the current line from the frame buffer while the DMA outputs a
line buffer. and then exchange those front and back buffers, exactly
timed at 31 000 times per second.

Needless to say we won’t use jpeg.

Note that we need this compressed data to be manipulated by our game,
so PNG and the like (really LZ77/LZW in memory pixel storage) will be too CPU intensive also, as well as impractical to
manipulate for non static images.

First we can use less bits per pixel by using indexed colors. By example, we could use a 256 colors palette using 1 byte per pixel giving nice colors and practical output. See the article about it on wikipedia, it’s well explained and has nice parrot pictures. Arrr !
The display function will translate the pixel color ids to a table of
pixel colors and store it in the buffer. Generally it’s done by
hardware by means of a RAMDAC (in : pixel ids, out : VGA signal,
inside : some RAM for the palette / a DAC, hence, a ramdac. See wikipedia article.)
but those chips are becoming hard to find/expensive and that’s an
additional chip and that would force us using indexed color, so no).

Is it feasible by software ?

Let’s calculate how much time lines per seconds we could output.
Let’s consider a naive pixel / byte algorithm, not using 32bit word-aware method.

So the STM32 @ 168M can output 168e6/(6*640) =
43750 lines/sec. That’ more than 31kHz horizontal refresh rate so
that’s possible (note that we include hblank periods) !

(taking 70% of
the CPU power - not counting V blank periods where we’re not outputting
video, compared to None when we’re using a framebuffer. That’s a serious
memory/cpu tradeoff, but if we can do better, having 50% of a 168MHz
CPU isn’t so bad after all).

Note that if we can use 16 colors palette, that’s half the RAM also
(4bits pp) or in 4 colors that’s 2bpp (with 1/4 less pixel reads and
keeping the whole palette in registers so no palette reads … that with
combined word writes can make it much faster).

Note also that with a palette you can manipulate the palette
individually from the pixel data, so you can do fadeouts quite easily by
switching palettes (not for free but because you’re already doing the
translation work).

But that is quite expensive, and another method will be used first.

Tiled backgrounds
Another technique, often used for backgrounds and very similar to
text mode goes further : instead of having a palette of pixels, lets
have a palette of sub-images, composed to make a bigger image : one for a
tree top, one for a bottom tree, one for grass : repeat many times and
you have a big forest with 3 small images + a map.

It’s similar to text
modes in that instead of doing it with letters (buffer of characters on
screen + small bitmaps representing letters), you do it with color
images (which can be letters).

Nice editors exist for tiled data, and we will use one to compose our images.

Storing such an image need storing tiles + a tilemap referencing
your elements. The bigger the tile, the less bits you need to store the
tilemap, the more you need to store the tiles. Note that tiles can be
stored with a palette also.

Many other choices can be made, and combining them is possible, but we have few cycles to spare for now, so let's consider only tiles for now.

While the preceding post was about generic video generation, this post will specify what is used by bitBox console for Video Generation.

First, the DAC : it will be a simple DAC made of resistors. A R2R ladder
could be used, it can be nice to only have few values of resistors when
manufacturing. Well, that’s nice but for now we’ll using less resistors
since we will manufacture by hand (duh) so a resistor DAC will be used. I
first tried a 8bit RRRGGGBB (as 8 bit).

That’s what the uzebox
(The 8bit homebrew console, it’s great and has been a great
inspiration) used with a 8 bit microcontroller, but here we have the
capacity (cpu and momory wise) to do a little more.

How much colors should we be able to display ?

It’s a question
of balance : more bits in the DAC looks better, but more bits mean more
CPU to build the signal and memory to store the nice graphics, as well
as a bigger RAM / Flash to store the graphics and more hardware complexity.

I
finally settled for 4096 colors, which is 4-4-4 = 12 bits + 4 unused
bits on a 16 bits output bus. The use of a palette will be defined by
the software, so let’s not talk about that now.

15 bits could also have been done, but I think 12bits will provide
nice colors anyway. The games won’t be photorealistic, so vivid colors
is aimed at, not realistic.

Then, how many pixels should we be able to output ? That’s a software thing ! Nothing in
hardware sets the number of pixels, as vertically it’s how often we fire
the h-sync, and horizontally is how fast we make the pixel vary.

Let's try defining a first video mode (all by software).

We should try to build on a standard VGA timings, which might be
easier for VGA screens to sync on because it’s a standard resolution, as
well as being compatible with many screens.

The universal resolution is 640x480, 60Hz, which is a resolution supported by quasi everything (even HDMI supports it - but of course we are not generating hdmi with a few resistors).

Note, however, that this will be the resolution the screen thinks it
gets. By example, there is no difference between varying the pixel levels
twice slower and having horizontally twice larger pixels : it’s the same thing.
As well, if you’re outputting the same line twice, it will effectively provide half the resolution. That will provide you by example 320x240 @ 12 bits if you vary the pixel clock for 240 pixels.

You can also "forget" to send anything for 20 lines after and 20 lines
before your signal, so you’ll have black lines and 320x200. Which has
the nice property of needing a 64k frame buffer if we use 1 byte per
pixel. 128k for double buffering… but more on that later.

The next thing to consider is how to store pixels in memory and how to output them.

Outputting can be done by bitbanging, ie writing
them clocked by the instruction clock of the processor.

The problem is
that we won’t have much time left to do anything else, and while the main CPU
is perfectly good at outputting bytes or halfwords, it really is much
more powerful so all those cycles could be spent doing more useful
things such as adding 4 bytes in parallel or running nice effects. It
would be nice if we had a small bit of silicon on the MCU able to move
data from memory to a peripheral (here GPIOs).

As a matter of fact, we do! It’s called a DMA for direct memory access. The stm32f4 has two of them.

Friday, January 11, 2013

The VGA software generation from a chip is quite simple as well as quite tricky to achieve.
Simply said, to output a vga signal, you should think of it as a
Cathod Ray tube, scanning from top left to bottom right in lines, and
being shut during getting back to left or back up to first position, as a
Z pattern (let’s thing progressive scan here).
Then, to output a VGA signal, you need to generate three varying red,
green, and blue signals (as 0-0.7 volts, 0 meaning black and 0.7 full
color), as well as H sync (to tell the tube to go left) and V sync (to
tell the tube to go to top right info)
Nice tutorials are available, so instead of copying and paraphrasing
them here, I’ll just link to them. Great links for VGA and Video signal
generation are :
- http://www.javiervalcarce.eu/wiki/VGA_Video_Signal_Format_and_Timing_Specifications
- http://neil.franklin.ch/Projects/SoftVGA/
- and finally a GREAT tutorial for video generation : http://www.lucidscience.com/pro-vga%20video%20generator-1.aspx
- A search engine using “VGA signal timings” terms by example
Composite is a little trickier with separate luma+chroma
The principle is very simple, what can be tricky is having the timing
perfectly done (or not too badly done) because you’re trying to
generate three 20MHz signals on a microcontroller … as well as
(hopefully) running a simple game !

So the idea is to deliver a simple, cheap, hardware base, home-reproducible, and versatile to hack.
Video signals and sound generation and processing will be
software-generated, so the exact characteristics (screen resolution,
tile-based engine, frame buffer or even 3D raster, number of sound
voices) will be defined by kernel software and will evolve as the
hardware is pushed by the software.

Kernels are just drivers set
to allow simpler game development by abstracting lower level VGA
generation (graphics signal generation) in libs.
The aim is to be simple and cheap, while getting up to date hardware (not in the of powerful meaning - that’s not the point, but easy to find and cheap).

Running at 168 MHz, 192 kB RAM and 1MB Flash memory, fast DMAs and 32
bit thumb2 cortex M4F instruction set with simd and float instructions,
this little beast seem to have what it takes to bring us to the world of
homemade snes (not ) ! It’s about 10-15$ also - even if the whole platform
will be more expensive, (whole car vs engine).

Hi, this is a personal blog aimed at relating my adventures in
developing a simple DIY console, based on ARM chip. The base of it will
be a single chip, the STM32F4 from STMicroelectonics.
The minimal hardware design will hopefully allow for hackability, as
quasi everything will be based on this chip +software rendering of the
video signal.
More on this later !