Results 1 to 14 of 14

Thread: Fast arithcoder for compression of LZ77 output

  1. #1
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,511
    Thanks
    746
    Thanked 668 Times in 361 Posts
    i work on lz77 compressor. using arithmetic coder here looks promising but it's quite slow for my purposes. does anyone knows the ways to make its usage faster?

    that i found: normalizing total to 2^n removes division in coder and one divison in decoder - 2x faster. because lz77 output contains fixed-width bit sequences (lowere bits of dist/length) i plan to encode them in a separate bit stream instead of sending through the arith coder. but may be there is some better way, in particular some fast way to encode such bit fields in arithmetic coder?

    and last problem - is it possible to change rangecoder to output more than one byte a time? how can i determine limits on total value imposed by this particular encoder/decoder implementation?

    the code i use is derived from quad with a little modification of using total=2^bits:

    <code>
    void Encode(unsigned int cum, unsigned int cnt, unsigned int bits) {
    low += (cum * (range >>= bits));
    range *= cnt;
    while (range < (1 << 24)) {
    range <<= 8;
    ShiftLow();
    }
    }

    void ShiftLow() {
    if ((low ^ 0xff000000) >= (1 << 24)) {
    unsigned int c = static_cast<unsigned int>(low >> 32);
    putc(buffer + c);
    c += 255;
    for (; help > 0; help--)
    putc(c);
    buffer = static_cast<unsigned int>(low) >> 24;
    }
    else
    help++;
    low = static_cast<unsigned int>(low) << 8;
    }
    </code>

    ps: is it possible to format the code here?

  2. #2
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    I just implemented a VERY fast arithmetic encoder!

    Currently Im under experiments...

    However, I implemented a bit-oriented encoder, similar as used with LZMA, but with MANY speed optimizations!

    Quote Originally Posted by Bulat Ziganshin
    ps: is it possible to format the code here?
    Nope, actually... Damn forum!

  3. #3
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Quote Originally Posted by Bulat Ziganshin
    and last problem - is it possible to change rangecoder to output more than one byte a time?
    Not really understand the question.

    Quote Originally Posted by Bulat Ziganshin
    how can i determine limits on total value imposed by this particular encoder/decoder implementation?
    With QUADs range encoder the max total value is ((1 << 24) - 1).

  4. #4
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    By the way, PPM, CM, and ARI is my power side!

  5. #5
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,511
    Thanks
    746
    Thanked 668 Times in 361 Posts
    Quote Originally Posted by encode
    I just implemented a VERY fast arithmetic encoder!
    i guess that it is just the same as in my code above? (using >>=bits instead of /=total)

    Quote Originally Posted by encode
    However, I implemented a bit-oriented encoder, similar as used with LZMA, but with MANY speed optimizations!
    really, i need to encode only one value whic represents either char or match/dist range and has about 1000 possible values

    Quote Originally Posted by encode
    Not really understand the question.
    rangecoder differs from standard arithcoder in that it ouputs one byte a time instead of one bit. i guess that it is possible to output 2-4 bytes at time with limits to total value and/or less encoding precision. but i dont understand aritjhcoding enough to change the coder and calculate new limits on total (btw, how i can calculate that this coder has 2^24 limit?)

  6. #6
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Quote Originally Posted by Bulat Ziganshin
    i guess that it is just the same as in my code above? (using >>=bits instead of /=total)
    Not really. With a bit-oriented encoding we can completely eliminate many things - for example check out the LZMA or fpaq0 source.

    Quote Originally Posted by Bulat Ziganshin
    really, i need to encode only one value whic represents either char or match/dist range and has about 1000 possible values
    With large values, you can encode each value bitwise (encode each bit of a large value "separately"). Again, check out the LZMA source or fpaq0 for bit tree coding.

    Quote Originally Posted by Bulat Ziganshin
    rangecoder differs from standard arithcoder in that it ouputs one byte a time instead of one bit. i guess that it is possible to output 2-4 bytes at time with limits to total value and/or less encoding precision. but i dont understand aritjhcoding enough to change the coder and calculate new limits on total (btw, how i can calculate that this coder has 2^24 limit?)
    The fastest thing is the caryless range coder by Dmitry Subbotin - check for PPMd sources!

    AFAIK, the exact MaxFreq depends on given implementation. For example:
    With Shindlers range encoder (as used with QUAD) MaxFreq: 2^24
    With Subbotins caryless range encoder this value is: 2^16
    With some special high-precision coders like CL-D MaxFreq is 2^31.

    Anyway, the Shindlers type coder is the most used overall (LZMA uses it too).

    Check out the aridemo from compression.ru!

  7. #7
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Quote Originally Posted by Bulat Ziganshin
    rangecoder differs from standard arithcoder in that it ouputs one byte a time instead of one bit.
    Not really, it performs renormalisation in larger units (bytes) instread of bits and is thus faster.

    Quote Originally Posted by Bulat Ziganshin
    i guess that it is possible to output 2-4 bytes at time with limits to total value and/or less encoding precision.
    Nope. We can make encoder carryless - at the cost of compression (a few bytes) as Subbotin does. fpaq0 is also carryless, by the way.

  8. #8
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,511
    Thanks
    746
    Thanked 668 Times in 361 Posts
    Quote Originally Posted by encode
    The fastest thing is the caryless range coder by Dmitry Subbotin - check for PPMd sources!
    10x! i thought that coder from your sources is carryless, but now i see the difference. i will try it, except for distances there is no things which needs more than 16 bits in my code

    i want to know how to determine arithcoder maxfreq from its sources and how to change coder to ouput more than one byte at a time.

    Quote Originally Posted by encode
    With large values, you can encode each value bitwise (encode each bit of a large value "separately").
    and you think that this will be faster than encoding 16 bits at once?

    btw, lzpm decompresses in my test two times slower than lzma. are you said that its faster?

  9. #9
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,511
    Thanks
    746
    Thanked 668 Times in 361 Posts
    Quote Originally Posted by encode
    Not really, it performs renormalisation in larger units (bytes) instread of bits and is thus faster.
    yes, and i want to know how to do it even in larger units

  10. #10
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Quote Originally Posted by Bulat Ziganshin
    btw, lzpm decompresses in my test two times slower than lzma. are you said that its faster?
    Im talking about 0.02!!! (Unreleased)

    Quote Originally Posted by Bulat Ziganshin
    and you think that this will be faster than encoding 16 bits at once?
    Yepp!

  11. #11
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Quote Originally Posted by Bulat Ziganshin
    yes, and i want to know how to do it even in larger units
    Again, check the aridemo package from compression.ru!

  12. #12
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,511
    Thanks
    746
    Thanked 668 Times in 361 Posts
    Quote Originally Posted by encode
    Quoting: Bulat Ziganshin
    and you think that this will be faster than encoding 16 bits at once?
    Yepp!
    i cant believe starring at carryfree sources, i count ~20-30 cycles for encoding 16-bit value. are you think that 16 encoding operations for individual bits may be faster?

    lets omit shifting operation (which should be the same for both encoders, yes?) and we left with

    low += (cum * (range >>= bits));
    range *= cnt;

    which should require about 10 cycles. how much cycles require encoding of *each* bit in your coder?

    in ppmd, binary coder is
    inline UINT rcBinStart(UINT f0,UINT Shift) { return f0*(range >>= Shift); }

    i.e. its only two times faster than encoding full 16 bits. and with rcBinCorrect operations, time is about the same. i think that economy is mainly due to less calls to normalizing procedure because we know that code cant overflow

    [QUOTE=Bulat Ziganshin
    and you think that this will be faster than encoding 16 bits at once?
    Yepp![/QUOTE]

    i cant believe starring at carryfree sources, i count ~20-30 cycles for encoding 16-bit value. are you think that 16 encoding operations for individual bits may be faster?

    lets omit shifting operation (which should be the same for both encoders, yes?) and we left with

    low += (cum * (range >>= bits));
    range *= cnt;

    which should require about 10 cycles. how much cycles require encoding of *each* bit in your coder?

    in ppmd, binary coder is
    inline UINT rcBinStart(UINT f0,UINT Shift) { return f0*(range >>= Shift); }

    i.e. its only two times faster than encoding full 16 bits. and with rcBinCorrect operations, time is about the same. i think that economy is mainly due to less calls to normalizing procedure because we know that code cant overflow

    Quote Originally Posted by Bulat Ziganshin
    and you think that this will be faster than encoding 16 bits at once?
    Yepp![/QUOTE

    i cant believe starring at carryfree sources, i count ~20-30 cycles for encoding 16-bit value. are you think that 16 encoding operations for individual bits may be faster?

    lets omit shifting operation (which should be the same for both encoders, yes?) and we left with

    low += (cum * (range >>= bits));
    range *= cnt;

    which should require about 10 cycles. how much cycles require encoding of *each* bit in your coder?

    in ppmd, binary coder is
    inline UINT rcBinStart(UINT f0,UINT Shift) { return f0*(range >>= Shift); }

    i.e. its only two times faster than encoding full 16 bits. and with rcBinCorrect operations, time is about the same. i think that economy is mainly due to less calls to normalizing procedure because we know that code cant overflow

    Quote Originally Posted by Bulat Ziganshin
    and you think that this will be faster than encoding 16 bits at once?
    Yepp![/QUOTE

    i cant believe starring at carryfree sources, i count ~20-30 cycles for encoding 16-bit value. are you think that 16 encoding operations for individual bits may be faster?

    lets omit shifting operation (which should be the same for both encoders, yes?) and we left with

    low += (cum * (range >>= bits));
    range *= cnt;

    which should require about 10 cycles. how much cycles require encoding of *each* bit in your coder?

    in ppmd, binary coder is
    inline UINT rcBinStart(UINT f0,UINT Shift) { return f0*(range >>= Shift); }

    i.e. its only two times faster than encoding full 16 bits. and with rcBinCorrect operations, time is about the same. i think that economy is mainly due to less calls to normalizing procedure because we know that code cant overflow

    <div class=""quoting"">Quoting: encode]<div class=""quoting"">Quoting: encode
    Again, check the aridemo package from compression.ru! ]<div class=""quoting"">Quoting: encode]<div class=""quoting"">Quoting: encode
    Again, check the aridemo package from compression.ru!
    go..

  13. #13
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Currently I'm too lazy to explain some details. However, while encoding a large value you must get the cumFreq using binary or linear search.

    for (int i = 0; i < s; i++)
    cum += cnt[i];

    Using bits you can eliminate such thinks. Yes you must code say 16 bits separetely, however, it is worth it. Check the LZMA decompression speed!

    In my coder I eliminate divisions, many things are done via bit shifts. Counters are also completely elininated as with LZMA, instead, I use an array of probabilities and after coding of each bit I directly modify the probability of a next bit.

    Anyway, I'm still working. Next version of LZPM 0.02 with the new arithmetic encoder wil be released soon!

    I think, you must make a few experiments and choose what will work for you!

  14. #14
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,511
    Thanks
    746
    Thanked 668 Times in 361 Posts
    the cld coder is exactly what i asked i should compare them all

Similar Threads

  1. Zhuff - fast compression
    By Cyan in forum Data Compression
    Replies: 38
    Last Post: 5th February 2014, 11:27
  2. balz v1.00 - new LZ77 encoder is here!
    By encode in forum Forum Archive
    Replies: 61
    Last Post: 17th April 2008, 23:57
  3. LZ77 speed optimization, 2 mem accesses per "round"
    By Lasse Reinhold in forum Forum Archive
    Replies: 4
    Last Post: 11th June 2007, 22:53
  4. Fast LZ compression
    By encode in forum Forum Archive
    Replies: 35
    Last Post: 25th April 2007, 02:35

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •