Tremor offers two modes of operation. Normal mode, and low precision mode. Normal mode requires 64-bit intermediate results in arithmetic operations, whereas low precision mode only requires 32-bit intermediates. Testing both modes against the standard Linux command line vorbis decoder, oggdec, reveals that the normal mode has a RMS error of 0.71 bits, whereas the low precision mode has RMS error of 58 bits. (I performed the test using Lepidoptera by Epoq from vorbis.com as the sample track, decoding to 16 bit, 44.1kHz stereo.) The result for low precision mode is consistent with user complaints of audible distortion.

The good news for Vorbis on DM642 is that using 48 bit intermediate results produces results very close to the normal mode, with RMS error of 1.0 bits. The mpylir instruction of the DM642 multiplies a 16 bit by a 32 bit quantity, and shifts the result to fit within 32 bits. This allows a decoder with quality almost indistinguishable from normal Vorbis output, but performance as fast as Tremor’s low quality mode.

Last year I did some experiments with the Theora video decoder on a Texas Instruments DM642 DSP. A royalty free video decoder is very attractive for embedded devices, but after some major restructuring for performance, some problems remained.

The main problem is that, unlike MPEG video, Theora video is not packed in the bitstream in the raster order that it is displayed on screen, but instead in Hilbert curve order. This is not a problem in itself, but Theora’s DC prediction and post-processing loop filter are both defined in raster order. The need to go over the data once in Hilbert curve order and once in raster order leaves Theora decode requiring higher memory bandwidth than MPEG decode.

The encoder faces a similar problem. Andrey N. Filippov describes an FPGA implementation of the Theora Encoder, and comments on the high memory bandwidth required. The solution in the article is to implement a custom SDRAM controller with knowledge of the Theora data structures, an option not available on a DSP.

There are other minor problems remaining. The DM642 has instructions to assist video encoding and decoding, but these are optimised for MPEG and may not easily apply to Theora. For example, the avg2 instruction averages two pairs of 16-bit values, but it uses the formula (x + y + 1) >> 1, whereas Theora’s half-pixel predictor uses the formula (x + y) >> 1.

Where does this leave Theora decode on DSP? The DM642 is just capable of decoding NTSC quality video (640×480, 30fps) provided that the bitrate is controlled. The good news is that the newer DaVinci architecture provides extra memory bandwidth through a DDR2 memory controller, plus the possibility of splitting the workload to place bitstream decode on the ARM processor and frame reconstruction on the DSP.