Results 1 to 18 of 18

Thread: QUAD 1.12 - the FINAL version is here!

  1. #1
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Okay! It's here: http://quad.sourceforge.net/

    The Linux version will be available later today. The AMD64 version, I hope, will be also available soon!

    Looking back, I can say QUAD was EXTREMELY progressed during a short period of time - new Max compression mode in many cases has the same speed as with old Normal. And new Normal mode is just fantastic!

    However, working on QUAD I got some idea to write a pure LZ77-coder - without PPM and other context-related stuff - to get the fastest decompression and high binary compression with tiny decompressor. To be continued...


  2. #2
    Member
    Join Date
    Dec 2006
    Posts
    611
    Thanks
    0
    Thanked 1 Time in 1 Post
    Great! It compresses faster than HASH3!

    Quote Originally Posted by encode
    I got some idea to write a pure LZ77-coder
    Even faster decompression than current QUAD? Nice How will removing PPM affect compression?

  3. #3
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Quote Originally Posted by Black_Fox
    Even faster decompression than current QUAD?
    Notable faster!

    Quote Originally Posted by Black_Fox
    How will removing PPM affect compression?
    Moving LZ towards we can acheive much higer binary compression and much faster decompression. However, compression will be slower.

    I must consult Bulat for additional LZ77 ideas.

    What I expect:

    Dictionary size: 64 KB ... 4 MB, probably sliding window
    Min match length: 2 (or 3)
    Order-0 literal encoder, possibly bit-oriented as with LZMA
    Binary tree or hash/context table search

    context/hash table can looks like current QUADs tab[], but instead of context, we will use the following bytes - 64...128 offsets for current 2...3 bytes - so we able to find the MINMATCH momentally.


  4. #4
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Thanks Ilia!

  5. #5
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Quick test with ENWIK8...


    Compression results:

    QUAD v1.10

    Compression Time = 00:02:22.203

    Compressed Size = 29,731,541 bytes


    QUAD v1.10 -x

    Compression Time = 00:04:06.250

    Compressed Size = 29,152,166 bytes


    QUAD v1.11

    Compression Time = 00:02:19.734

    Compressed Size = 29,649,604 bytes


    QUAD v1.11 -x

    Compression Time = 00:04:12.406

    Compressed Size = 29,110,519 bytes


    QUAD v1.12

    Compression Time = 00:01:12.406

    Compressed Size = 29,649,604


    QUAD v1.12 -x

    Compression Time = 00:01:44.907

    Compressed Size = 29,110,519



    Decompression results:

    QUAD v1.10

    Decompression Time = 00:00:27.969
    (Compressed DEFAULT mode)

    Decompression Time = 00:00:27.750
    (Compressed MAX mode)


    QUAD v1.11

    Decompression Time = 00:00:30.594
    (Compressed DEFAULT mode)

    Decompression Time = 00:00:30.344
    (Compressed MAX mode)


    QUAD v1.12

    Decompression Time = 00:00:28.000
    (Compressed DEFAULT mode)

    Decompression Time = 00:00:27.625
    (Compressed MAX mode)



    Amazing speed improvement!

  6. #6
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,511
    Thanks
    746
    Thanked 668 Times in 361 Posts
    i also see 2x speed improvement on text files (including those preprocessed with dict) and 1.5x on executables, so now quad is 1.5 times faster and 5% better than lzma on my text file!

    that is even more fascinating is that deompression speed only 2 times (for text) or 1.5 times (for exe) less than compression time so it seems that compression format now puts serious limits to further speedup of fast mode while max mode doesn't show serious compression improvements compared to fast one

    but before finishing this project (i mean releasing last version with backward compatibility) you can still add some more speed to it. first, you can make compression even faster than decompression! just add multithreading support

    second, now compressor is less memory bound and more cpu cycles -bound. you can avoid losing cpu cycles by using zip's lazy scheme

    third, you can try to return r[x] in compressor again. and give two versions - one with and second without r[x] to the benchmark publishers. they have more modern computers, where additional 64kb array will probably don't overload L2 cache too much

    moreover, i have idea of combining best of both worlds - you can save this index in first 16-byte element of hash row and save in these 16 bytes two or three last hash elements together with their first bytes (because first elements will be used much often than the rest, it is worth to save more information about them).

    so, shift operation will mean copying several values inside this 16-bytes header and saving oldest element in the rest of hash row which is organized in cyclic way

    one more idea - because you know which 16-byte blocks you will need to fetch from memory, you can try to prefetch them before they are actually used. the same true for decompressor - if you decode index before length, you can prefetch appropriate pointer from tab so decoding of length will overlap with expensive memory access operation

    otoh, you can try to get benefit even with your cuurrent scheme. let's imagine 3-stage decompressor which works like modern pipelined CPU

    first stage decodes len/index and fetches appropriate pointer Ptr from tab. second stage fetches data pointed by Ptr from buf. and third stage does actual string copying

    the idea is to, like CPU does, slice stages for various parts of process. we decode len3/match3 and fetch from memory Ptr3. then we, not waiting while Ptr3, arrives, use Ptr2 (fetched on previous step) to fetch buf[Ptr2]. then we, again not waiting buf[Ptr2], copy string from buf[Ptr1] to the current pos. and then cycle repeats:

    we decode len4/match4 and fetch from memory Ptr4. then we use Ptr3 (fetched on previous step) to fetch buf[Ptr3]. then we copy string from buf[Ptr2] to the current pos. voila!

    another possibility is to use two threads - one decoding len/index/chars to the special buffer of, say, 4kb length and another using these values to copy actual values. it would be great for 2-core/HT cpus, but not for old-fashion ones

    we may think how to apply these ideas to encoding, too

  7. #7
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Yes, there is still some ideas to check. However, currently I'm experimenting with pure LZ77 compression check the LZARI post for info.

    In addition, I want to see the MFC results, finally!

  8. #8
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,511
    Thanks
    746
    Thanked 668 Times in 361 Posts
    btw, correct me if i'm wrong but it seem that we should credit Ross Williams for ROLZ idea - his LZRW4 algorithm seems to be exactly ROLZ: http://www.cbloom.com/src/lzrw.zip

    he also describes here why ROLZ should be superior to LZ77

  9. #9
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Quote Originally Posted by Bulat Ziganshin
    btw, correct me if im wrong but it seem that we should credit Ross Williams for ROLZ idea - his LZRW4 algorithm seems to be exactly ROLZ: http://www.cbloom.com/src/lzrw.zip
    Yepp!

  10. #10
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    But actually, even Ross Williams not the inventor or ROLZ - the idea was born somethere in the far past...

  11. #11
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,511
    Thanks
    746
    Thanked 668 Times in 361 Posts
    i don't think so. at this moment it was novel idea and you can see in previous lzrw's how he travellled to it. then, lzp was born on top of it, as an idea of using variable-length context but save only one, latest match in each context. btw, i think that combined lzp+rolz - with a 2-4 matches saved for each context (or, to be exact, for each hash value) - may be superior to both methods. you can try it

  12. #12
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Read this first:
    http://cbloom.com/papers/fenlzp.html

    Quote Originally Posted by Bulat Ziganshin
    btw, i think that combined lzp+rolz - with a 2-4 matches saved for each context (or, to be exact, for each hash value) - may be superior to both methods. you can try it
    This will slowdown the decompression. However, probably, its worth a try... Anyway, for now Im with LZ77...

  13. #13
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,511
    Thanks
    746
    Thanked 668 Times in 361 Posts
    it described one more idea - symbol ranking:

    Basically I have extended your LZP idea, that the most recent matching
    context is a good predictor of the next symbol. If that first symbol is
    wrong, I search back at the same order for the next different symbol and
    offer that as a second suggestion

  14. #14
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,511
    Thanks
    746
    Thanked 668 Times in 361 Posts
    btw, i think that it should enough to significantly speedup quad just make only one updating of codes after 10-100 probavilities updates

    also, you don't even tried to save 4 more characters in each hash entry

    and one more idea - when you found long match with large index (say >32) - throw away this match from hash table because current string may replace it up to their common length and saving other matches will be more useful in this case

  15. #15
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Quote Originally Posted by Peter Fenwick
    If you go back to Shannons 1951 paper, your LZP method is basically his
    first method, while mine is his second method. The probabilities of
    succeeding on the 1st, 2nd, 3rd... symbol are very similar indeed to the
    symbol probabilities coming out of the MTF stage in block sorting. For
    "paper1" the corresponding probabilities are -

    symbol rank 0 1 2 3 4 5 6 7
    block-sort 58.3% 11.3% 5.5% 3.7% 2.8% 2.3% 1.8% 1.6%
    "shannon" 58.8% 11.6% 5.8% 3.8% 2.7% 2.0% 1.6% 1.4%

  16. #16
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,511
    Thanks
    746
    Thanked 668 Times in 361 Posts
    yes, yes, it's a symbol ranking scheme, which cycles through possible *chars* in the same context while rolz cycles through various positions in buffer

  17. #17
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    Nice speed improvement in quad 1.12.
    http://cs.fit.edu/~mmahoney/compression/text.html# 2561

  18. #18
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Thank you Matt! (However, looks like the main table still contains the older and slower version of QUAD - 1.11)

Similar Threads

  1. CSC3 FINAL
    By Fu Siyuan in forum Data Compression
    Replies: 12
    Last Post: 25th August 2009, 07:59
  2. PIM v2.90 (Final) is here!
    By encode in forum Data Compression
    Replies: 4
    Last Post: 2nd January 2009, 21:42
  3. PIM v2.50 - FINAL
    By encode in forum Data Compression
    Replies: 7
    Last Post: 14th July 2008, 21:14
  4. BALZ v1.13 [FINAL] is here!
    By encode in forum Data Compression
    Replies: 15
    Last Post: 30th June 2008, 23:00
  5. QUAD 1.11x is here! (a testbed version)
    By encode in forum Forum Archive
    Replies: 43
    Last Post: 7th April 2007, 13:50

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •