Thursday, April 5, 2012

I've often wondered how hard it is to output a PNG file directly, without using a library or a standard tool like pnmtopng. (I'm not sure when you'd actually want to do this; maybe for a tiny embedded system with a web interface.)

I found that constructing a simple, uncompressed PNG does not require a whole lot of code, but there are some odd details I got wrong on the first try. Here's a crash course in writing a minimal PNG encoder. We'll use only a small subset of the PNG specification, but I'll link to the full spec so you can read more.

The example code is not too fast; it's written in Python and has tons of string copying everywhere. My goal was to express the idea clearly, and let you worry about coding it up in C for your embedded system or whatever. If you're careful, you can avoid ever copying the image data.

We will assume the raw image data is a Python byte string (non-Unicode), consisting of one byte each for red, green, and blue, for each pixel in English reading order. For reference, here is how we'd "encode" this data in the much simpler PPM format.

I lied when I said we'd use no libraries at all. I will import Python's standard struct module. I figured an exercise in converting integers to 4-byte big endian format would be excessively boring. Here's how we do it with struct.

import struct

def be32(n):return struct.pack('>I', n)

A PNG file contains a sequence of data chunks, each with an associated length, type, and CRC checksum. The type is a 4-byte quantity which can be interpreted as four ASCII letters. We'll implement crc later.

The IHDR chunk, always the first chunk in a file, contains basic header information such as width and height. We will hardcode a color depth of 8 bits, color type 2 (RGB truecolor), and standard 0 values for the other fields.

The actual image data is stored in DEFLATE format, the same compression used by gzip and friends. Fortunately for our minimalist project, DEFLATE allows uncompressed blocks. Each one has a 5-byte header: the byte 0 (or 1 for the last block), followed by a 16-bit data length, and then the same length value with all of the bits flipped. Note that these are little-endian numbers, unlike the rest of PNG. Never assume a format is internally consistent!

Since a DEFLATE block can only hold 64 kB, we'll need to split our image data into multiple blocks. We will actually want a more general function to split a sequence into chunks of size n (allowing the last chunk to be smaller than n).

def pieces(seq, n):return [seq[i:i+n] for i in xrange(0, len(seq), n)]

PNG wants the DEFLATE blocks to be encapsulated as a zlib data stream. For our purposes, this means we prefix a header of 78 01 hex, and suffix an Adler-32 checksum of the "decompressed" data. That's right, a self-contained PNG encoder needs to implement two different checksum algorithms.

We're almost done, but there's one more wrinkle. PNG has a pre-compression filter step, which transforms a scanline of data at a time. A filter doesn't change the size of the image data, but is supposed to expose redundancies, leading to better compression. We aren't compressing anyway, so we choose the no-op filter. This means we prefix a zero byte to each scanline.

Actually, a PNG file may contain any number of IDAT chunks. The zlib stream is given by the concatenation of their contents. It might be convenient to emit one IDAT chunk per DEFLATE block. But the IDAT boundaries really can occur anywhere, even halfway through the zlib checksum. This flexibility is convenient for encoders, and a hassle for decoders. For example, one of many historical PNG bugs in Internet Explorer is triggered by empty IDAT chunks.

Here are those checksum algorithms we need. My CRC function follows the approach of code fragment 5 from Wikipedia. For better performance you would want to precompute a lookup table, as suggested by the PNG spec.