Results 1 to 9 of 9

Thread: Finite Precision Arithmetic Coder: How to know the right interval

  1. #1
    Member
    Join Date
    Aug 2019
    Location
    Germany
    Posts
    4
    Thanks
    3
    Thanked 0 Times in 0 Posts

    Finite Precision Arithmetic Coder: How to know the right interval

    Hi,

    I am currently trying to figure out how to determine the correct binary value in the finite precision implementation of arithmetic coding (AC). I'd like to implement it with the side information about the number of symbols encoded so that I do not have to use an end of stream symbol.
    I was not able to find such an implementation anywhere, so I'd like to implement it myself.

    Lets go by example:
    I have an alphabet with two symbols {a, b} and would like to encode the message "abab" with symbol probabilities P(a) = 0.8, P(b) = 0.2. Now going by infinite precision arithmetic coding I end up at the interval [0.7424, 0.768], use 7 bits and yield the codeword "1100000" (by using using the binary representation of the lower bound truncated after the 7th bit + adding 2^-7 to make it lie within the interval), corresponding to the probability 0.75.
    Now using the usual E1-E3 scaling operations, for finite precision AC I get the following sequence of intervals:

    Message to be encoded:
    "abab":
    i) [0, 1)
    ii) [0, 0.8 ) (Encoded: "a")
    iii) [0.64, 0.8 ) (Encoded: "b")
    iv) [0.28, 0.6) (Scaling: E1) => Output "1"
    v) [0.06, 0.7) (Scaling: E3)
    vi) [0.06, 0.572) (Encoded: "a")
    v) [0.4696, 0.572) (Encoded: "b")
    vi) [0.3784, 0.788 ) (Scaling: 2x E3)

    where the step vi) is not really required.

    Now what do we have? We outputted a single bit "1" and are left with three unresolved E3 scalings. Which bit-sequence do we add to the output? It has to be "100000" to get the same codeword as above. But I am struggling with finding the rule to get the correct codeword.
    I figured that the correct codeword length can be found by
    Code:
    ceil(-log2(HIGH - LOW)) + 1 + numel(bin_seq) + E3_count;
    where numel(bin_seq) gives the number of bits already outputted (here: one) and E3_count is the number of E3 scalings unresolved and ceil is the round up function.

    But I am still unsure about finding the correct codeword from the interval I ended up at. Anyone got an idea? I mean the codeword has to correspond to a probability within [0.64, 0.8), this can be deduced from ii).

    Before I write unnecessary text, does anyone know how to correctly find the final codeword? That would be a great help, thank you :)

  2. #2
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,207
    Thanks
    187
    Thanked 955 Times in 493 Posts
    If its done only once, at the coder flush, can't you just write a loop to find the shortest possible codeword that fits within the interval?
    I don't think its possible to just calculate the length, for example LOW=0111, HIGH=1111, -log(HIGH-LOW)+1=4,
    but the shortest code would be "10" or something (can be also "1" or "0" depending on implementation).

  3. #3
    Member
    Join Date
    Aug 2019
    Location
    Germany
    Posts
    4
    Thanks
    3
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Shelwien View Post
    If its done only once, at the coder flush, can't you just write a loop to find the shortest possible codeword that fits within the interval?
    I don't think its possible to just calculate the length, for example LOW=0111, HIGH=1111, -log(HIGH-LOW)+1=4,
    but the shortest code would be 10 or something.
    Thank you for your reply! I dont want to find the length, I already know how to calculate it. I need to find a way to actually derive the correct codeword that lies in the desired interval without using an EOF symbol.

    but the shortest code would be 10 or something
    only in cases where you already know what you are going to code in the future. Else you need the sufficient condition of codewordlength = ceil(-log(HIGH-LOW))+1.


    can't you just write a loop to find the shortest possible codeword that fits within the interval?
    Well, I could write a loop where I test whether the decoder would decode the correct message, because the problem only arises at the end which keeps the complexity low in most cases. But this does not seem like an unnecessary workaround. To give context, I want to encode binary messages that are rather short (but too long for efficient huffmancode) and the overhead of an EOF symbol is crucial in this situation.

  4. #4
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,207
    Thanks
    187
    Thanked 955 Times in 493 Posts
    >> but the shortest code would be 10 or something

    > only in cases where you already know what you are going to code in the future.
    > Else you need the sufficient condition of codewordlength = ceil(-log(HIGH-LOW))+1.

    Not quite.
    If its actually EOF, we can tweak i/o to return 0s or 1s when reading after EOF.
    Eg. "0" flush with padding of 1s would turn into "0111(1)" and would fit into interval.

    As to "10", any "10xx(x)" fits, and we'd be able to return unnecessary bits after decoding.

    >> can't you just write a loop to find the shortest possible codeword that fits within the interval?

    > Well, I could write a loop where I test whether the decoder would decode
    > the correct message, because the problem only arises at the end which keeps
    > the complexity low in most cases.

    Yes. But its not necessary to test full decoding - you just need to find the right sub-interval.
    I meant something like this:
    x = HIGH;
    n = ceil(-log2(HIGH));
    for( l=0; l<n; l++ ) {
    mask = (1<<l)-1;
    nmask = ~mask;
    xl = x & nmask;
    xh = x | mask;
    if( (xl>=LOW) && (xh<HIGH) ) break;
    }


    > To give context, I want to encode binary messages that are rather short
    > (but too long for efficient huffmancode) and the overhead of an EOF symbol
    > is crucial in this situation.

    I posted some similar coders here:
    https://encode.su/threads/3084-Simpl...ll=1#post59643

    Although these are byte-aligned.

  5. The Following User Says Thank You to Shelwien For This Useful Post:

    InfiniteAC (13th August 2019)

  6. #5
    Member
    Join Date
    Aug 2019
    Location
    Germany
    Posts
    4
    Thanks
    3
    Thanked 0 Times in 0 Posts
    Maybe it is still unclear what I mean, so here is another take on it, at least I believe you are doing something else:
    At the end of my encoding process I have my interval [LOW, HIGH]. Now I have another implementation of AC where I simply use doubles for simplicity and do not use any scaling operations (naive AC). This works for my current problem, because I can restrict myself to short messages and no extreme probabilities. But for some future development I need an AC that does not need to encode an EOF symbol, because I always know how many symbols will be encoded (it is a fixed number known prior both by the encoder and the decoder).

  7. #6
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,207
    Thanks
    187
    Thanked 955 Times in 493 Posts
    > I need an AC that does not need to encode an EOF symbol, because I always know how many symbols will be encoded

    The thread I linked contains working implementations of various coders that do exactly that.
    Most recent version even implements self-termination (AC EOF is created by adjusting flush code to cause decoding error at file EOF).
    Modifying it to work with bit alignment should be simple enough, I can make a demo if necessary.
    Basically just have to remove byte alignment loop from rangecoder_v0e/sh_v2f.cpp:
          while( byte(bits>>24)!=0xFF ) outbit(1);

    (rangecoder is the same thing as AC, just optimized for bytewise i/o).

  8. The Following User Says Thank You to Shelwien For This Useful Post:

    InfiniteAC (14th August 2019)

  9. #7
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,207
    Thanks
    187
    Thanked 955 Times in 493 Posts
    Actually here it is: http://nishi.dreamhosters.com/u/bitalign_v0.rar

    I made some small changes to sh_v2f_static (like adding a flush at LF and bitoffset index in frq file).
    Surprisingly, tail cutting actually reduced the compressed data size, so compressed file size for book1 is 433033 (but there's also the index).

    So a script like this:
    Code:
    coder c book1 1 1_frq
    coder d 1 2 1_frq 
    md5sum 2 book1
    
    echo line 0:
    coder d0 1 con 1_frq
    
    echo line 1:
    coder d1 1 con 1_frq
    
    echo line 16621:
    coder d16621 1 con 1_frq
    works like this:
    Code:
    C:\9A6R-bitalign_ari\!arc\bitalign_v0>test.bat
    0a0fdbaf0589c9713bde9120cbb20199 *2
    0a0fdbaf0589c9713bde9120cbb20199 *book1
    line 0:
    <Y 1874>
    
    line 1:
    <A T. HARDY>
    
    line 16621:
    THE END

  10. The Following User Says Thank You to Shelwien For This Useful Post:

    InfiniteAC (14th August 2019)

  11. #8
    Member
    Join Date
    Aug 2019
    Location
    Germany
    Posts
    4
    Thanks
    3
    Thanked 0 Times in 0 Posts
    Thank you very much, I will have a look at it later

  12. #9
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    679
    Thanks
    214
    Thanked 209 Times in 129 Posts
    Alternatively you could just continue scaling (renormalization) until reaching [0,1) ...
    Implementing AC/RC you should also remember about possibility of staying around renormalization boundary e.g. [0.499999,0.500001) - there is used readjustment to handle it (loosing ~0.5 bits each time) - see nice article https://en.wikipedia.org/wiki/Range_encoding

    Also there are lots of available implementations, see e.g. references in https://sites.google.com/site/powturbo/entropy-coder

Similar Threads

  1. Replies: 95
    Last Post: 27th May 2019, 10:07
  2. Witten-Cleary arithmetic coder
    By Marco_B in forum Data Compression
    Replies: 1
    Last Post: 31st March 2017, 15:38
  3. Division-free arithmetic coder
    By Bulat Ziganshin in forum Data Compression
    Replies: 17
    Last Post: 19th May 2016, 02:12
  4. Minimal Ashford arithmetic-coder termination
    By Ethatron in forum Data Compression
    Replies: 18
    Last Post: 15th January 2011, 14:38
  5. flzp_ac2 (flzp + an order-2 arithmetic coder)
    By inikep in forum Data Compression
    Replies: 4
    Last Post: 25th June 2008, 21:37

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •