Results 1 to 7 of 7

Thread: Terminating an Arithmetic Coded Stream

  1. #1
    Member
    Join Date
    Apr 2020
    Location
    United States of America
    Posts
    3
    Thanks
    4
    Thanked 0 Times in 0 Posts

    Question Terminating an Arithmetic Coded Stream

    I’ve been lurking on this forum for a while and I finally got around to reading some papers and articles on arithmetic coding to get started. This past week, I implemented my first arithmetic coder which I tested together with a fixed order-0 model for ASCII, an adaptive order-0 model, an adaptive order-1 model, and an adaptive order-2 model. The heart of the encoder/decoder is largely based on Mark Nelson’s example which operates on 8 bits at a time and has an end-of-stream symbol included in the model.
    Running the algorithm with some verbose logging added in shows that the decoder needs to read a few more bits than the encoder outputs. Supposing it operates using unsigned 64-bit integral types, then the decoder would need to read in 33 bits (precision/2+1) at the very beginning of the file before it starts decoding even the first symbol. It’s possible that the encoder may produce an output that is less than 33 bits.
    At the end of the stream where the end-of-stream symbol acts as a terminator, the encoder is able to output just a few bits and terminate the encoding process. But the decoder keep reading until all of the input bits have been shifted into the high positions. In both the start-of-stream and end-of-stream cases, I modified the decoder to keep reading past the end of the file, pretending that any non-existent bits are 0.
    This is a problem because the encoded stream may be embedded into other streams with the end of the encoded stream directly adjacent to another stream. The decoder may end up consuming bits meant for another purpose. And depending on how the IO is implemented, the consumed bits may be lost—unavailable for that other purpose. The HP Laboratories paper suggests either padding the end of the encoded stream with extra bits or prefixing the stream with the length of the data. The former is what I’m trying to avoid as it is essentially the opposite of compression (adding spurious zero-entropy bits to the encoded data). The latter would not be possible if the consumer does not know in advance what the length of the data will be, and even so I suspect it would not avoid the problem of reading extra bits as there is a possibility of a very good model compressing so well that the encoded bits are few enough to disrupt our estimate of when we should stop reading.
    I do have some ideas to make sure the decoder only reads the bits it needs instead of reading as many as it can to fill up the bit buffer, but it also involves a lot of extra CPU cycles and forces me to peel away a lot of the nice abstractions between the model and the decoder.

  2. #2
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    767
    Thanks
    236
    Thanked 244 Times in 149 Posts
    Stream entropy coders produce complete bits to the bitstream during renormalization, e.g. rounding up the real number of bits to the flushed ones - you should lose on average half of its size in such flush during termination: 1/2 bit in 1bit renormalization, 8 bits in 16bit renormalization.
    AC/RC additionally need this "readjustment" ( https://en.wikipedia.org/wiki/Range_encoding ) preventing too short non-renormalizable ranges, each of them means on average 1/2 bit loss.
    ANS instead needs storing the final state, but it can be compensated by storing information in the initial state ... we could optimize it to reduce this cost of final flush.
    In practice we can often combine multiple symbol stream to avoid the rounding up in final flush, unless e.g. required parallel decoding.

  3. Thanks:

    LiKenun (27th April 2020)

  4. #3
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,842
    Thanks
    288
    Thanked 1,244 Times in 697 Posts
    > The latter would not be possible if the consumer does not know in advance
    > what the length of the data will be,

    The best solution is still keeping the lengths of data strings somewhere -
    its required for indexing anyway, and allows for the coder to work without EOF symbols.

    Otherwise only once EOF is decoded, we could calculate the number of bits
    required for encoder flush and return any unnecessary prefetched bits to input stream.

    > I do have some ideas to make sure the decoder only reads the bits it needs
    > instead of reading as many as it can to fill up the bit buffer, but it also
    > involves a lot of extra CPU cycles and forces me to peel away a lot of the
    > nice abstractions between the model and the decoder.

    Sure, its possible to only read enough code bits for code interval to fit
    into one of symbol ranges. For a bitwise AC it doesn't change anything API-wise.
    Also its the only solution if you really don't want to read after EOF.

    https://encode.su/threads/3174-Finit...ll=1#post61256
    https://encode.su/threads/3084-Simpl...ll=1#post59643

  5. Thanks:

    LiKenun (27th April 2020)

  6. #4
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,842
    Thanks
    288
    Thanked 1,244 Times in 697 Posts
    Here I implemented a minimum-lookahead decoder.
    However it would be fairly hard to simulate decoder behaviour in encoder (since it depends on future symbols).
    I can do it by creating a decoder coroutine and feeding it output bits as input... but in the end it would be
    still necessary to read all bits written before encoder flush anyway, and then some bits to stay within the range,
    so the total number should end up the same.
    Attached Files Attached Files

  7. Thanks (2):

    kampaster (13th April 2020),LiKenun (27th April 2020)

  8. #5
    Member
    Join Date
    Apr 2020
    Location
    United States of America
    Posts
    3
    Thanks
    4
    Thanked 0 Times in 0 Posts
    I’ve yet to read up on RC/ANS yet since I only recently grasped the concept of arithmetic coding. I completed my Fenwick tree code just yesterday and planning on rewriting the whole arithmetic coding project so that the code doesn’t look like spaghetti.

    required parallel decoding
    Either data parallelism or workload parallelism would imply giving up compression, no? I’ve taken a second look at the encoder/decoder I’ve written, and it seems like the process is entirely serial with scant opportunities for vectorization even. If I introduce parallelization, the data would have to be chunked and compressed independently. It would work very well for a pre-computed model but maybe not an adaptive model; the adaptations would only be for a particular chunk. I believe this is how Z-Standard and typical solid compression modes operate.

    The best solution is still keeping the lengths of data strings somewhere -
    its required for indexing anyway, and allows for the coder to work without EOF symbols.

    Otherwise only once EOF is decoded, we could calculate the number of bits
    required for encoder flush and return any unnecessary prefetched bits to input stream.
    Sure, its possible to only read enough code bits for code interval to fit
    into one of symbol ranges.
    For my rewrite exercise, I’m thinking the code will have to optimize for 3 different situations:
    1. The input stream is “seekable.” It doesn’t matter whether we read beyond the end of the encoded data. We can always tell the underlying stream to rewind so that we effectively “return any unnecessary prefetched bits to input stream.” No length hint is required to be given to the decoder.
    2. The input stream is not seekable. We read the smallest unit of data possible (a byte) and have the decoder query the model for possible symbols—without updating the model. If there are more than 1 symbol returned, then the decoder reads another byte from the input stream to increase the precision until only 1 symbol is returned. I anticipate that this will be excruciatingly slow. (I’ll review your minimum look-ahead decoder for inspiration; perhaps you have a more efficient method.)
    3. The input stream is not seekable, but the application calling the encoder/decoder kept a record of the compressed length. The compressed length is given to the decoder as a hint so that it know exactly when to stop reading bytes with or without an EOF; it will perform just as well as in case #1.

    ANS instead needs storing the final state, but it can be compensated by storing information in the initial state ... we could optimize it to reduce this cost of final flush.
    For a bitwise AC it doesn't change anything API-wise.
    Also its the only solution if you really don't want to read after EOF.
    I’ve read some about ANS and bitwise AC, but I haven’t gone far enough to understand either. For bitwise AC, I’m also aware the the encoder/decoder state is much more simplified and that there’s a multiplication-free algorithm.

    I take some time to digest new concepts, so bear with me for replaying so slowly. My last compression algorithm was written for a high school calculator and it was RLE.

  9. #6
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,842
    Thanks
    288
    Thanked 1,244 Times in 697 Posts
    > 1. The input stream is “seekable.” [...] No length hint is required to be given to the decoder.

    Known length of compressed stream has some additional effects -
    less flush bits have to be written if read after EOF is known
    to return always 1s or always 0s.
    Correctly decoding for any random lookahead bits requires a narrowed
    code interval and thus more bits actually encoded.

    "No length hint" means adjusting the flush for random lookahead...
    but if all stream positions are known, it automatically means
    that lengths are known.

    I suppose you can experiment with decoder convergence -
    its possible with huffman and ANS, but likely not with AC.

    > 2. The input stream is not seekable. We read the smallest unit of data
    > possible (a byte) and have the decoder query the model for possible
    > symbols—without updating the model.

    That's kinda what I previously posted.
    AC decoder normally has a fixed-size lookahead comparing to encoder,
    so with max AC range set to 256 we can reduce this lookahead to 1 byte...
    but such a low precision would create a lot of redundancy.

    > I anticipate that this will be excruciatingly slow.

    Decoding speed is ~440kb/s in my implementation.

    > (I’ll review your minimum look-ahead decoder for inspiration; perhaps you have a more efficient method.)

    Probably not; there's a specific number of bits required to identify each symbol's interval.

    > I’ve read some about ANS

    I think ANS is not a good choice for this.
    It has exactly the same problems as AC here (single-byte state is possible,
    but causes too much redundancy, otherwise there's a lookahead issue),
    and in addition has encoder and decoder working in different directions.

  10. Thanks:

    LiKenun (28th April 2020)

  11. #7
    Member
    Join Date
    Apr 2020
    Location
    United States of America
    Posts
    3
    Thanks
    4
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Shelwien View Post
    Known length of compressed stream has some additional effects -
    less flush bits have to be written if read after EOF is known
    to return always 1s or always 0s.
    Correctly decoding for any random lookahead bits requires a narrowed
    code interval and thus more bits actually encoded.

    "No length hint" means adjusting the flush for random lookahead...
    but if all stream positions are known, it automatically means
    that lengths are known.

    I suppose you can experiment with decoder convergence -
    its possible with huffman and ANS, but likely not with AC.
    I’ll do some experimentation during my rewrite.

    Quote Originally Posted by Shelwien View Post
    That's kinda what I previously posted.
    AC decoder normally has a fixed-size lookahead comparing to encoder,
    so with max AC range set to 256 we can reduce this lookahead to 1 byte...
    but such a low precision would create a lot of redundancy.

    > I anticipate that this will be excruciatingly slow.

    Decoding speed is ~440kb/s in my implementation.

    > (I’ll review your minimum look-ahead decoder for inspiration; perhaps you have a more efficient method.)

    Probably not; there's a specific number of bits required to identify each symbol's interval.
    I’m guessing this might be possible to speed up if the limiting factor is not IO. If the cumulative frequency tables are small enough, a using SSE/AVX intrinsics on a flat array might be faster than using a Fenwick tree for looking up symbols. It SSE/AVX seems to give a nice boost to linear search (https://schani.wordpress.com/tag/c-o...rch-sse2-simd/) and it also speeds up binary tree searches (https://stackoverflow.com/questions/...tree-traversal). But my intuition is that a ton of calls to
    stream.ReadByte()
    is going to limit the performance most.

    Quote Originally Posted by Shelwien View Post
    I think ANS is not a good choice for this.
    It has exactly the same problems as AC here (single-byte state is possible,
    but causes too much redundancy, otherwise there's a lookahead issue),
    and in addition has encoder and decoder working in different directions.
    I see what you mean (https://fgiesen.wordpress.com/2015/1...s-in-practice/):
    Once you have this small piece of scaffolding, you really can use rANS as a drop-in replacement for a conventional arithmetic coder. There’s two problems with this, though: if you use this to encode an entire large file, your symbol buffer can get pretty huge, and you won’t get a single bit of output until you’ve processed the entire input stream.
    The solution is simple (and can also be found in the aforementioned fpaqa): instead of accumulating all symbols emitted over the entire input stream and doing one big flush at the end, you just process the input data in chunks and flush the coder periodically, resetting the rANS state every time. That reduces compression slightly but means the encoder memory usage remains bounded, which is an important practical consideration. It also guarantees that output is not delayed until the end of stream; finally, lots of compressors are already chunk-based anyway. (For the opportunity to send incompressible data uncompressed rather than wasting time on the decoder end, among other things.)
    Having to encode in the opposite direction of decoding would pretty much mean that the length is known and that the compressed data will be chunked. But I suppose it’s useful for certain scenarios like file systems or network packets which have limited lengths to begin with.

Similar Threads

  1. Deflate stream question
    By Razor12911 in forum Data Compression
    Replies: 5
    Last Post: 20th September 2016, 00:08
  2. lzma2 stream detector
    By Shelwien in forum Data Compression
    Replies: 0
    Last Post: 4th June 2016, 19:28
  3. ZLib Stream Checker
    By Razor12911 in forum Data Compression
    Replies: 2
    Last Post: 25th May 2016, 22:16
  4. Compression for a stream that flushes often?
    By aninternet in forum Data Compression
    Replies: 11
    Last Post: 19th April 2014, 11:15
  5. Precomp stream detection
    By danswano in forum Data Compression
    Replies: 10
    Last Post: 26th September 2013, 03:10

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •