Hi,

I am currently trying to figure out how to determine the correct binary value in the finite precision implementation of arithmetic coding (AC). I'd like to implement it with the side information about the number of symbols encoded so that I do not have to use an end of stream symbol.

I was not able to find such an implementation anywhere, so I'd like to implement it myself.

Lets go by example:

I have an alphabet with two symbols {a, b} and would like to encode the message "abab" with symbol probabilities P(a) = 0.8, P(b) = 0.2. Now going by infinite precision arithmetic coding I end up at the interval [0.7424, 0.768], use 7 bits and yield the codeword "1100000" (by using using the binary representation of the lower bound truncated after the 7th bit + adding 2^-7 to make it lie within the interval), corresponding to the probability 0.75.

Now using the usual E1-E3 scaling operations, for finite precision AC I get the following sequence of intervals:

Message to be encoded:

"abab":

i) [0, 1)

ii) [0, 0.8 ) (Encoded: "a")

iii) [0.64, 0.8 ) (Encoded: "b")

iv) [0.28, 0.6) (Scaling: E1) => Output "1"

v) [0.06, 0.7) (Scaling: E3)

vi) [0.06, 0.572) (Encoded: "a")

v) [0.4696, 0.572) (Encoded: "b")

vi) [0.3784, 0.788 ) (Scaling: 2x E3)

where the step vi) is not really required.

Now what do we have? We outputted a single bit "1" and are left with three unresolved E3 scalings. Which bit-sequence do we add to the output? It has to be "100000" to get the same codeword as above. But I am struggling with finding the rule to get the correct codeword.

I figured that the correct codeword length can be found by

where numel(bin_seq) gives the number of bits already outputted (here: one) and E3_count is the number of E3 scalings unresolved and ceil is the round up function.Code:ceil(-log2(HIGH - LOW)) + 1 + numel(bin_seq) + E3_count;

But I am still unsure about finding the correct codeword from the interval I ended up at. Anyone got an idea? I mean the codeword has to correspond to a probability within [0.64, 0.8), this can be deduced from ii).

Before I write unnecessary text, does anyone know how to correctly find the final codeword? That would be a great help, thank you :)