I've been interested lately in Jarek Duda's ANS, out of curiosity.

The intention is to experiment with an entropydecoder suitable for low-performance CPU.

By low-performance, I mean CPU without built-in multiplication capability (multiplication can be emulated, but it's much longer).

I understand that binary arith decoders can work properly with just multiplications and comparisons, making them quite fast and efficient.

(General arith decoders, on the other hand, require divisions.)

My understanding is that ASN exchange these requirements against the following ones :

1) Use of look-up tables, exclusively (with the risk of becoming very large,)

2) Decoding in reverse direction of encoding

3) Designed for fixed probability rates (as a consequence of 1 & 2) but Matt found a way around it by saving both the symbol and its probability into an intermediate table and encoding them in reverse order. (I still have to understand how the lookup table can be modified on-the-fly though...)

So it's not an "all win".

So far, I've read Jarek's paper (well, tried to...), Matt's fpaqa source code and DCE comments, and blog entry of Andrew Polar. I've also played with Wolfram demo.

And I still don't get it.

I feel I can make it work on some simple cases, but not understand why it works, and therefore how to use it in more generic circumstances.

My current understanding is as follows :

As the encoder receives new symbols to encode, the "state" value grows, according to a translation table.

The higher the probability, the less the state value increases.

The intuition is that a large number in the translation table results from

either a few low probability symbol, or many of high probability symbols. Or of course a mix of them.

Decoding is simply a reverse operation, traversing the translation table in the other direction.

So any number in the table represents a deterministic suite of symbols.

Obviously, the translation table is not infinite, so at some point, there is a need to "resize" the state value, so that it remains within the table.

That's where I get lost.

fpaq recommends keeping the "state value" between 2^N and 2^N+1.

WolfRam demo keeps it between 2^N and 2^N+2, updating 2 bits at a time.

Andrew Polar keeps it below 2^N, resizing by any number of bits, to remain within [2^N-1..2^N[

Showing there are different possibilities.

The decoding side is well described on Wolfram :

The "state" value decreases, outputting one symbol at a time.

When the "state" reaches a value within [2^N-2..2^N[, the bitstream is resized by 2 bits, ensuring the new "state" value is once again between [2^N..2^N+2[

The translation table below 2^N is not shown. (does it exist ?)

The corresponding encoding side, on the other hand, is mysterious.

To better study it, I've been playing with a simple/trivial example, using a binary distribution as follows :

0: 75% ; 1: 25%

The good thing is that allows to play with N=2, which is easier to manage by hand on paper.

It's easy to get Wolfram generate a working solution,

but I wish I can understand how it is generated, and how it works, in both direction (encoding & decoding).

Anyone having tried this before ?