Results 1 to 5 of 5

Thread: Questions on range encoder implementation

Threaded View

Previous Post Previous Post   Next Post Next Post
  1. #1
    Join Date
    Mar 2012
    Thanked 0 Times in 0 Posts

    Questions on range encoder implementation


    I'm trying to implement a simple range encoder and despite lots of material on the Net on this topic, I start feeling that I'm getting lost. There are lots of examples, all of them different and lots of implementation choices nearly in every line one would wright down.

    So, I would appreciate very much if anybody could answer some of my questions:

    First of all I've chosen one implementation to follow: "Range Encoder" by Arturo Campos, just because it was mentioned in the Arithmetic Coding tutorial which I'm following.

    My concern is primarily about compression ratio and then about the speed. The speed is very important too, but not at the expense of any significant compression ratio losses.

    So, the Q.1 is - if my choice was right at all (rate coding instead of simple arithmetic coding, and Arturo Campos implementation as a reference)

    As for theory and practice, I understand everything except encoder renormalize phase. I have so many questions here that I don't even know where to start. Lets start from the beginning:

    Question 2: Range initialization value:

    In Campos' code the range initial value is 0x80000000. In some other implementations I've walked through init values are something like DWORD(-1), which is 0xFFFFFFFF. Both options have no sense to me. As far as I understand we decide to work with 32bit integers without last bit to prevent overflow. What would make sense here is something like 0x7FFFFFFF - the largest 31bit number. Why 0x80000000? Why 0xFFFFFFFF?...

    The other questions are about the code. Here it is:

    Encoder renormalize:
    • While ( range <= Bottom_value ) As long as it's not above than Bottom_value (it means we can still output)
      • if ( low < ( 0FFh << Shift_bits ) ) If higher part is 0FFh a carry will occur which we can't afford
      • Yes We had a range with a potential carry (which we can't capture with our precision)
        • Output_byte ( buffer ) Output the byte
        • For ( ; help != 0 ; --help ) Output all underflow bytes pending for output
          • Output_byte ( 0FFh )
        • Buffer = low >> Shift_bits Assign to the byte to output the high bytes of low
      • No Now check for actual carry
        • if ( ( low And Top_value) != 0) Check carry bit
        • Yes
          • Output_byte ( buffer + 1 ) The +1 comes from the carry which propagates till the byte saved
          • For ( ; help != 0 ; --help ) Output all underflow bytes pending for output
            • Output_byte ( 00h ) 0FFh's become 0's (the carry!)
          • Buffer = low >> Shift_bits Assign to the byte to output the high bytes of low
        • No
          • ++Buffer
      • Range << 8 Shift out byte which we have already outputted or are going to be outputted (in case of 0FFh)
      • Low << 8
      • Low And ( Top_value - 1 ) Clear carry bit
      • Repeat

    Question 3:
    the loop condition.

    In wikipedia article on Range encoding, the loop condition is written like this:

    // check if left-most digit is same throughout range
    while (low / 10000 == (low + range) / 10000)

    I do understand what happens here: for all further (low, range) values that we will get in further iterations the first digit will stay the same and hence we can output it already, etc... I suspect that this: While ( range <= Bottom_value ) has to mean the same, but I really don't understand why. The Bottom_value is 0x00800000.

    Question 4: The overflow and underflow and handling of them

    The most dark part for me. As I understand, we can get two situations when iterating low and range, each of which we have to handle exceptionally:

    a) Overflow: when low + new_value will become more than 0x7FFFFFFF. Then we have to handle it somehow.
    b) Undeflow: when low and range become so close to each other that there is no more bits left between them to represent the interval and thus continue iterations.

    So, I absolutely don't understand how these situations are handled in inner if ( low < ( 0FFh << Shift_bits ) ) { ... } branch and why does it look like this. Could anyone explain?

    Thank you in advance,
    Last edited by ilya; 16th March 2012 at 20:52.

Similar Threads

  1. Greetings, Questions, and Benchmarks
    By musicdemon in forum Data Compression
    Replies: 4
    Last Post: 8th January 2012, 22:45
  2. Questions about compression
    By 0011110100101001 in forum Data Compression
    Replies: 12
    Last Post: 8th December 2011, 02:31
  3. Hashing a large range of integers to a smaller range
    By mtwaha in forum Data Compression
    Replies: 3
    Last Post: 12th April 2011, 00:27
  4. Replies: 7
    Last Post: 19th March 2011, 11:50
  5. How fast should be a range coder ?
    By Cyan in forum Data Compression
    Replies: 33
    Last Post: 16th November 2009, 17:02

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts