Asymmetric Numeral System

The source code with demo is available:
ANSCoder
Asymmetric Numeral System was introduced by Jarek Duda for several different purposes.
We consider here only data compression aspect. Those who want more information can try
to read latest Jareks article
The first demonstration of ANS coder was published by author at
Wolfram in March 2008 that means it is not likely that this method will ever be
patented because time for filing patent is running out and author confirmed several
times that he has no intent to patent this method.

According to online news this
coding approach is used in ZSTANDARD compressor of the Facebook.

Encoding/decoding table

state

A

B

C

1

2

3

5

2

4

6

10

3

7

8

15

4

9

11

20

5

12

14

25

6

13

17

30

7

16

21

8

18

22

9

19

26

10

23

28

11

24

12

27

13

29

14

31

According to the method we need to create single look up table as it is shown above.
Encoding may start from any row and from any symbol of the alphabet (A,B,C). If first symbol
in the message is B and first row is 1 we take new row number from crossing row 1 and column B,
which is 3 and move to the next row. If the next symbol is A we read next row number form the
crossing of column A and row 3. This number is 7 and we can go on until the final extend of
our table.
Decoding is as elementary as encoding. Having any number, let say 16, we can extract symbol A
and previous row 7. Moving back to the 7th element in a table we can extract another symbol A
and another new row 3. And finally from element 3 we extract B and row 1, which is the highest
point at the table and end of message. Successful encoding and decoding is possible if

Left column in the table contains all sequential numbers.

All numbers in columns A,B,C are unique (no repetition).

Numbers in every row are larger than the number of that row.

Numbers in all columns are sorted.

The number of row obtained after every step is called state. Having state we can restore the
message. This encoder becomes entropy encoder when numbers in columns are selected to match
probability of occurrence of correspondent symbol if we divide row number by the value in the
column. In this table if we divide row number by any element sitting in column A we have
approximately 0.45. The ratio for B is 0.35 and for C is 0.2. The state is growing on
each step according to formula
statei = statei-1 / P{symbol}
that means at the end of encoding we shall have number with the length equal to
1 / [P{B} * P{A} * P{A} ...]
which is, obviously, in accordance with entropy.
The only part that is missed in this brief introduction is how to manage the state when
we reach the bottom of the table. That can be read in the article which also have ANSCoder
example.
In addition to what is already explained we need to add management of bits output in encoding in the way synchronized with
decoding. Let us get back to our example. This concept is close or similar to the one
used in original sample of ANSCoder written by Jarek Duda. According to the concept
decoder reads certain number of bits to make the state buffer of the same size
before taking of every step. As you can see from the table, in decoding the state
can becomes smaller on every step, so if decoder read always any number of bits to make it of
the same size we never make the wrong choice and decoding becomes very simple. Read bits,
process symbol, read bits, process symbol and so on.
Let us show the decoding process before we explain an encoding one. Presume we have
following binary expression to decode:
11000011110
.
According to main property of ANS coder we decode backwards. Based on the size of our table
we appoint the size of the state buffer 5 bits. On the first step we read last 5 bits
of encoded expression shown above, which is
11110 and address the table with this number, which is 30. We extract symbol
C and obtain new state 6 or in binary format 110 . In order to make our new
state buffer match size 5 we read another 2 bits from the expression and get
11000. which is 24. For state=24 we extract symbol A and make new state 11,
which is in binary format 1011. We read one more bit and get (10110)
22. Then we do the same thing: we extract B, get state=8, read one more bit, get 16, extract A, get 7,
read last two bits and get end-of-message marker 11111. We know that this is
end of message because no more bits to read and state match the default marker.

Encoding/decoding table

state

A

B

C

1

2

3

5

2

4

6

10

3

7

8

15

4

9

11

20

5

12

14

25

6

13

17

30

7

16

21

8

18

22

9

19

26

10

23

28

11

24

12

27

13

29

14

31

Encoding process

state before output of bits

31

16

22

24

output bits

11

0

0

00

state after bit output

7

8

11

6

processed symbol

A

B

A

C

state after processing

16

22

24

30

The decoding process is much simpler than encoding. In encoding we need to output bits to provide this simple
rule that we used in decoding. The encoding process is shown in the table on the left. We can start from any 5 bit number
but we prefer
distinguished end/start marker 31 (11111). The concept for encoding is a little-bit more
complicated. We output bits before processing a symbol. As you can see we can not process 31, there is no
row number 31, so we output as many bits as necessary in order to process the symbol. We can find that out
because we know what symbol we are processing. It is A, and we need to output two least significant bits, which is
11. We show the whole process in the table. The number before we output bits is 31.
Below this number are two bits that we output to the stream. The result after output state=7. We process
7 with symbol A and get new state=16. Next symbol is B. According to the table we can not process B
with state=16, so we need to output one least significant bit 0 and make new state=8.
After processing B for the state=8 we get 22 and so on. We have to output least significant bits because
we need to reduce the number. After we process our very last symbol C we have state=30 and we output it
completely so our output buffer looks like it is shown above.

The code was written in May, 2008. This explanation was added and edited some time later.
Andrew Polar.