4/30/2013

04-30-13 - Packing Values in Bits - Flat Codes

One of the very simplest forms of packing values in bits is simply to store a value with non-power-of-2
range and all values of equal probability.

You have a value that's in [0,N). Ideally all code lengths would be the same ( log2(N) ) which is fractional
for N not a power of 2.
With just bit output, we can't write fractional bits, so we will lose some efficiency. But how much exactly?

You can of course trivially write a symbol in [0,N) by using log2ceil(N) bits. That's just going up to the next
integer bit count. But you're wasting values in there, so you can take each wasted value and use it to reduce
the length of a code that you need. eg. for N = 5 , start with log2ceil(N) bits :

0 : 000
1 : 001
2 : 010
3 : 011
4 : 100
x : 101
x : 110
x : 111

The first five codes are used for our values, and the last three are wasted.
Rearrange to interleave the wasted codewords :

0 : 000
x : 001
1 : 010
x : 011
2 : 100
x : 101
3 : 110
4 : 111

now since we have adjacent codes where one is used and one is not used, we can reduce the length of
those codes and still have a prefix code. That is, if we see the two bits "00" we know that it must
always be a value of 0, because "001" is wasted. So simply don't send the third bit in that case :

0 : 00
1 : 01
2 : 10
3 : 110
4 : 111

(this is a general way of constructing shorter prefix codes when you
have wasted values). You can see that the number of wasted values we
had at the top is the number of codes that can be shortened by one bit.

The worst case is 8.6% of a bit per symbol excess. The worst case
appears periodically, once for each power of two.

The actual excess bits output for some low N's :

The worst case actually occurs as N->large, because at higher N you can get f closer to
that worst case fraction (ln(2)). At lower N, the integer steps mean you miss the worst
case and so waste less. This is perhaps a bit surprising, you might think that
the worst case would be at something like N = 3.