Previously we were allocating 2 strings per byte, plus another for the
offset. Instead we can just write all this directly to stdout.
Also replace the calculation of the ANSI escape code for every single
byte. This produces a significant speedup.