Hi!

I'm trying to implement a simple range encoder and despite lots of material on the Net on this topic, I start feeling that I'm getting lost. There are lots of examples, all of them different and lots of implementation choices nearly in every line one would wright down.

So, I would appreciate very much if anybody could answer some of my questions:

First of all I've chosen one implementation to follow: "Range Encoder" by Arturo Campos, just because it was mentioned in the Arithmetic Coding tutorial which I'm following.

My concern is primarily about compression ratio and then about the speed. The speed is very important too, but not at the expense of anysignificantcompression ratio losses.

So, theQ.1is - if my choice was right at all (rate coding instead of simple arithmetic coding, and Arturo Campos implementation as a reference)

As for theory and practice, I understand everything except encoder renormalize phase. I have so many questions here that I don't even know where to start. Lets start from the beginning:

Question 2:Range initialization value:

In Campos' code the range initial value is 0x80000000. In some other implementations I've walked through init values are something like DWORD(-1), which is 0xFFFFFFFF. Both options have no sense to me. As far as I understand we decide to work with 32bit integers without last bit to prevent overflow. What would make sense here is something like 0x7FFFFFFF - the largest 31bit number. Why 0x80000000? Why 0xFFFFFFFF?...

The other questions are about the code. Here it is:

Encoder renormalize:

- While ( range <= Bottom_value ) As long as it's not above than Bottom_value (it means we can still output)

- if ( low < ( 0FFh << Shift_bits ) ) If higher part is 0FFh a carry will occur which we can't afford
- Yes We had a range with a potential carry (which we can't capture with our precision)

- Output_byte ( buffer ) Output the byte
- For ( ; help != 0 ; --help ) Output all underflow bytes pending for output

- Output_byte ( 0FFh )
- Buffer = low >> Shift_bits Assign to the byte to output the high bytes of low
- No Now check for actual carry

- if ( ( low And Top_value) != 0) Check carry bit
- Yes

- Output_byte ( buffer + 1 ) The +1 comes from the carry which propagates till the byte saved
- For ( ; help != 0 ; --help ) Output all underflow bytes pending for output

- Output_byte ( 00h ) 0FFh's become 0's (the carry!)
- Buffer = low >> Shift_bits Assign to the byte to output the high bytes of low
- No

- ++Buffer
- Range << 8 Shift out byte which we have already outputted or are going to be outputted (in case of 0FFh)
- Low << 8
- Low And ( Top_value - 1 ) Clear carry bit
- Repeat
the loop condition.

Question 3:

In wikipedia article on Range encoding, the loop condition is written like this:

// check if left-most digit is same throughout range

while(low / 10000 == (low + range) / 10000)

I do understand what happens here: for all further (low, range) values that we will get in further iterations the first digit will stay the same and hence we can output it already, etc... I suspect that this: While ( range <= Bottom_value ) has to mean the same, but I really don't understand why. The Bottom_value is 0x00800000.

Question 4:The overflow and underflow and handling of them

The most dark part for me. As I understand, we can get two situations when iterating low and range, each of which we have to handle exceptionally:

a) Overflow: when low + new_value will become more than 0x7FFFFFFF. Then we have to handle it somehow.

b) Undeflow: when low and range become so close to each other that there is no more bits left between them to represent the interval and thus continue iterations.

So, I absolutely don't understand how these situations are handled in inner if ( low < ( 0FFh << Shift_bits ) ) { ... } branch and why does it look like this. Could anyone explain?

Thank you in advance,

Ilya.