Results 1 to 13 of 13

Thread: Fast Bit I/O

  1. #1
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,985
    Thanks
    377
    Thanked 353 Times in 141 Posts

    Fast Bit I/O

    A thread to collect some Bit I/O optimization ideas.

    Of course, optimized Bit I/O implementation depends on how program read bits - i.e. bit by bit, large integer values, etc.

    For common LZSS/LZ77 implementation we need to read:

    1. Bit-by-Bit - for 0/1 literal/match(copy) flags
    2. 8-bit values - for literals
    3. Small integer values and/or Bit-by-Bit, depending on actual implementation - for match lenghts
    4. Small integers and/or large ones - for match offsets

    Apparently we may use templates with inline code:

    template<int CNT>
    inline void putbits(int val)
    {
    // ...
    }

    // ...

    putbits<8>(ch);

    ...

    We may load more than 8-bits into "bitbuf" at each loop iteration...

    ...

  2. #2
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    865
    Thanks
    463
    Thanked 260 Times in 107 Posts
    Why use bit i/o for literals ?

  3. #3
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,985
    Thanks
    377
    Thanked 353 Times in 141 Posts
    It depends on Bit I/O implementation. Bit and Byte write/read must be synchronized. Keep in mind that in good LZ77 encoder we must minimize literal count, utilizing 2-byte (or even 1-byte) matches, even with 3-byte MIN_MATCH and descent window size the literal count for many files will be fairly small.

  4. #4
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,424
    Thanks
    223
    Thanked 1,053 Times in 565 Posts
    This is how zlib does it:
    Code:
    uint getbits( uint need ) {
      uint val = bitbuf;
      while( bitpos<need ) {
        val |= uint(*inpptr++) << bitpos;
        bitpos += 8;
      }
      bitbuf = val >> need;
      bitpos -= need;
      return val & ((1<<need)-1);
    }
    
    void putbits( uint val, uint len ) {
      if( bi_valid+len > 16 ) {
        bi_buf |= val << bi_valid;
        pending_buf[pending++] = bi_buf & 0xff;
        pending_buf[pending++] = bi_buf >> 8;
        bi_buf = val >> (16-bi_valid);
        bi_valid += len - 16;
      } else {
        bi_buf |= val << bi_valid;
        bi_valid += len;
      }
    }
    I don't like the implementation (though in fact, I didn't see _anything_ implemented properly in zlib),
    but the code layout there (lsb-to-msb) is different from what I normally use, and that could be a decoding speed
    optimization (shifting the new value can be faster than shifting the cache, then adding the new value).

    Anyway, afaik, normally bitfield i/o should be very similar to rangecoder i/o (accumulating bitfields in a cache
    register, output when it overflows). Of course, its also possible to read/write bitfields directly in the data,
    but that would be _much_ slower.

    As to possible optimizations, I guess, we can try msb-to-lsb vs lsb-to-msb layouts, aligned reads/writes
    (see how zlib writes 16 bits at a time? but we can do 32 bits now), replacing branches with binary logic.

    But overall its very simple anyway, and I really wonder whether it would matter on modern cpus...

  5. #5
    Member RichSelian's Avatar
    Join Date
    Aug 2011
    Location
    Shenzhen, China
    Posts
    160
    Thanks
    18
    Thanked 56 Times in 27 Posts
    Quote Originally Posted by encode View Post
    It depends on Bit I/O implementation. Bit and Byte write/read must be synchronized. Keep in mind that in good LZ77 encoder we must minimize literal count, utilizing 2-byte (or even 1-byte) matches, even with 3-byte MIN_MATCH and descent window size the literal count for many files will be fairly small.
    For literals left in LZ77, we can still use some low order predictors to achieve good compressions, but how do we compress LZ77's offset-pos? I don't see any good ideas better than a 0-order coder.
    In my lz77-ari compressor. I got best compression when I set min_len to 8 or 7, a small min_len will result bad compression ratio.

  6. #6
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    You may want to see how FFMPEG uses bitstreaming. IMO, video decoding one of the most "speed demanded" I/O operations. In a file (sorry, I don't remember it clearly), there is class which holds several 4x 32-bits integers (128 bits in total). By using these form, you can use SSE or other tricks. But, it's not combatible with deflate's weird bit streaming (LSB-to-MSB). I believe it's the most efficient form of bit streaming. Of course, one should benchmark to measure speed properly.
    BIT Archiver homepage: www.osmanturan.com

  7. #7
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,424
    Thanks
    223
    Thanked 1,053 Times in 565 Posts
    > For literals left in LZ77, we can still use some low order predictors to achieve good compressions,
    > but how do we compress LZ77's offset-pos?
    > I don't see any good ideas better than a 0-order coder.

    Not sure where you're looking then :)
    Check lzma.

    > In my lz77-ari compressor. I got best compression when I set min_len to 8 or 7, a small min_len will result bad compression ratio.

    I'm not sure whether it would work without parsing optimization,
    but lzma encodes distances in length context, plus offset alignment, plus "state" (quantized matchtype history).
    Also it supports full matches starting from 2 bytes, and even 1-byte rep0-matches.

  8. #8
    Member RichSelian's Avatar
    Join Date
    Aug 2011
    Location
    Shenzhen, China
    Posts
    160
    Thanks
    18
    Thanked 56 Times in 27 Posts
    lzma encodes distances in length context? I try that in my compressor replacing 0-order model, and get worse compression???

  9. #9
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,424
    Thanks
    223
    Thanked 1,053 Times in 565 Posts
    Well, its not so simple. Lzma splits the distances into 3 parts (exponent, uncompressed mantissa and compressed alignment)
    and uses different contexts for all of these.
    There're also 3 distance ranges (1-4, 5-128, >12 which are encoded differently (mentioned 3 parts are from last case).
    Anyway, you can put your matches into .rec format from http://encode.su/threads/1381-.rec-toolkit and compare it directly.

  10. #10
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts
    I don't know about fast but I think most Bit I/O routines miss the fact that you should be able to handle files with bit strings of any length. As well as write bit files of any length.

    My code bit_byts.cpp used in most of my code and I changed it every so often so not sure what is really the latest version handles bit strings of any length while actually using only normal byte files of 8 bits each in a bijective way.

    Actually if you wrote code to say hande 64 or 256 bit chunks at a time it not hard to think of every file as a group of say 256 bit chunks no matter what the length in bytes really is. With the proper bijective interface you don't to worry about odd length files if you decides to code a huffman or arithmetic coder using 2*16 base tokens.

    Or if 19bit tokens are your thing no problem.

  11. #11
    Member RichSelian's Avatar
    Join Date
    Aug 2011
    Location
    Shenzhen, China
    Posts
    160
    Thanks
    18
    Thanked 56 Times in 27 Posts
    Quote Originally Posted by Shelwien View Post
    Well, its not so simple. Lzma splits the distances into 3 parts (exponent, uncompressed mantissa and compressed alignment)
    and uses different contexts for all of these.
    There're also 3 distance ranges (1-4, 5-128, >12 which are encoded differently (mentioned 3 parts are from last case).
    Anyway, you can put your matches into .rec format from http://encode.su/threads/1381-.rec-toolkit and compare it directly.
    But what's the relationshihp between distance and length? I used to print all <length, distance> pairs to watch their frequency. Short length and distance happens more frequently but I don't see relationshp between them. I just think they are two independent variants.

  12. #12
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,424
    Thanks
    223
    Thanked 1,053 Times in 565 Posts
    > Short length and distance happens more frequently

    In other words, short length/long distance (and reverse) are less probable,
    which means that we should allocate longer codes for them.

    To be specific, lzma uses Min(len,5) as a context for "posslot" (which is log2(distance)*2+distance bit),
    and then "posslot" as context for distance<128, that's all.

    Most distance bits are encoded without any compression at all (ie with probability 0.5)

  13. #13
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,985
    Thanks
    377
    Thanked 353 Times in 141 Posts
    Quote Originally Posted by Shelwien View Post
    In other words, short length/long distance (and reverse) are less probable,
    which means that we should allocate longer codes for them.
    It's not that simple!

    It's easier to find 3-byte match at long distance - say at 128K distance. Also, a long matches at short distances are very common with binary files.

    Another thing is that we may want to skew the prob. distribution - discarding short matches at long distances - i.e. probably more beneficial to encode literals instead of a such match, or code a literal, plus next longer match, etc. Anyway, such thing depends on coding scheme you use. Some time ago I performed some tests with my match finder and found that the longest match we may find usually placed at large distance. Take into account that the "longest match we may find" is frequently just MIN_MATCH long...

    Quote Originally Posted by RichSelian
    I got best compression when I set min_len to 8 or 7, a small min_len will result bad compression ratio.
    You do something wrong! Unless you use an order-4 or higher order CM for literal coding, such-too-long MIN_MATCH isn't possible and inefficient. Probably the catch in offset coding! Well, first, try to limit offset range for short matches, say:
    2-byte match - max 128..512
    3-byte match - max 2048..8192
    4-byte match - max 256K..unlimited

    As to offset coding, try PAQqy style:
    Code:
    --offset; // keep offset of 0..WINDOW_SIZE-1
    
    int cxt=1;
    for (int i=WINDOW_BITS-1; i>=0; --i)
    {
      const int y=(offset>>i)&1;
      Encoder::Encode(y, counterX[cxt].P());
      counterX[cxt].Update(y);
      cxt+=cxt+y;
    }
    It's simplest order-0 way. Better to do something more complex - a la LOGgy style. Just separate offset to position slot and position footer. Check out LZX documentation (attached to this post)!

    Good luck!
    Attached Files Attached Files

Similar Threads

  1. Do you have a 64-bit machine at home?
    By encode in forum The Off-Topic Lounge
    Replies: 22
    Last Post: 4th December 2009, 14:09
  2. Bit guessing game
    By Shelwien in forum Data Compression
    Replies: 11
    Last Post: 24th November 2009, 02:22
  3. BIT Archiver
    By osmanturan in forum Data Compression
    Replies: 137
    Last Post: 16th January 2009, 20:19
  4. RINGS Fast Bit Compressor.
    By Nania Francesco in forum Forum Archive
    Replies: 115
    Last Post: 26th April 2008, 22:58
  5. Bit Archive Format
    By osmanturan in forum Forum Archive
    Replies: 39
    Last Post: 29th December 2007, 00:57

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •