Reed-Solomon Codec

June 08, 2012

Some Background

I'm working on an image processing project for the university, whose purpose is to embed
(an extract) a print-scan resilient watermark into an image. This project has (sadly) gotten me
acquainted with Matlab, from which I quickly ran way into the friendlier realms of
Scipy and friends (Skimage rocks, by the way).
I must say I really learned to appreciate the Scipy/Numpy gang in the last two weeks :)

If it wasn't already obvious, it's time to admit I'm a n00b when it comes to signal processing and
applied math in general. I know the Fourier transform in broad terms, and have heard of
discrete cosine transform
and wavelets some time ago... but it's not my cup of tea,
to say the least. Luckily for clueless people like me, Matlab (and its kin) enables us
to summon the dark powers of mathematics, without ever having to know what we're doing. Hurrah!

So I'm DFT'ing, DCT'ing and DWT'ing like a pro, embedding my watermark using CDMA/spread-spectrum
techniques in the frequency domain, and then inverting the process... and all I know is
I'm supposed to keep myself in the mid-frequency range, i.e., in a certain region of the matrix.
Math for n00bs.

My original idea was to take a string, encode it as a QR code,
and then embed this QR image into the host image. I thought it would be a nice shortcut, as it
provided me with a synchronization pattern and error correction out of the box, but it quickly
turned out QR codes generate a payload that's too big for unobservable embedding.
So I set out to find some error correcting code (ECC) library for python, but it proved to be
a really difficult task. I found some packages, most of them are haven't been maintained in over
7 years nows, and all of them make use of extension modules that failed to compile. Then there's
zfec, but for the life of me, I couldn't figure out how to
use it as a simple encoder/decoder.

The Library

I almost gave up and resorted to triplicating my payload (at the bit level), and using
majority-selection for each bit, when I came across an amazing python tutorial (with runnable
code) that covers Reed Solomon codes and QR in depth.
I simply extracted the code, added a usable API, wrote some examples and quickly
uploaded it to PyPI, so now there's a pure-python
Reed-Solomon encoder/decoder: pip install reedsolo.

The library should support python 2.4-3.2, using strings or bytes. I really can't verify the
correctness of the algorithm (it's beyond me), but it seems to work so I'm fine with it. Here's
a short demo:

>>> fromreedsoloimportRSCodec>>> rs=RSCodec(10)# 10 bytes of ECC will be added to the output, ... # which allows us to correct up to 5 byte-level errors>>> rs.encode([1,2,3,4])'\x01\x02\x03\x04,\x9d\x1c+=\xf8h\xfa\x98M'>>> rs.encode("hello world")'hello world\xed%T\xc4\xfd\xfd\x89\xf3\xa8\xaa'>>> rs.decode('hello world\xed%T\xc4\xfd\xfd\x89\xf3\xa8\xaa')'hello world'

It's pure python and highly unoptimized... I think someone acquainted with Numpy a little
more than I am could improve it blindfolded by a factor 10, but even now, on my dinosaur machine,
it encodes a 400kB message in 2.9 seconds and decodes it in 1.9 seconds. I'll drink to that.
By the way, it seems that the library can only handle messages that are less than 255 bytes
long... but then you can simply encode/decode in chunks. I'll include it in later versions.

I think a good ECC library for python is very useful... if anyone wants to join in on it,
feel free to drop me a line at the comments or just
fork the repo.