Page 1 of 2 12 LastLast
Results 1 to 30 of 56

Thread: QUAD 1.11 is here!

  1. #1
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Okay, what I've done with this version:

    One of the most important things is changes with Normal mode. Now QUAD with Normal mode becomes faster and at the same time has higher compression. I hope with this new Normal mode QUAD can really outperform Fast LZMA not only in terms of compression ratio, but in speed also.



    With Max mode QUAD is also now has a higher compression, at the same time operating at slightly higher speeds.



    Although for myself I call this version some kind of final, this shouldn't be the end of the story. While working on QUAD 1.11 I tried lots of different 'optimal' parsing schemes, including pricing schemes and some kind of dynamic programming approach. Anyway, current improved weighed flexible parsing scheme shows the best results so far. However, I believe that future improvements are still possible. But here we must run more tests and probably someday more experienced with Optimal Parsing people like Malcolm Taylor or Igor Pavlov will bring some ideas...



    Anyway, for now, I'm very curious about results of QUAD 1.11, especially speed test at Ultimate Command Line Compressors and MFC. However, looks like Werner Bergmans was kidnapped by aliens...
    In addition, I'm very curious about compression speed of QUAD with BOTH modes at Large Text Benchmark.

    P. S.
    Matt, if you hear me, please email me an AMD64 binary of QUAD, of course when you get some spare time! Thank you!



    quad.sourceforge.net


  2. #2
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Quote Originally Posted by encode
    2 Matt, a short reference:

    QUAD uses ROLZ compression (Reduced Offset LZ). It makes use of an order-2 context to reduce the offset set that is matched to. This can be regarded as a fast large dictionary LZ. Literals and Match Lengths fits in a single alphabet which is coded using an order-2-0 PPM with Full Exclusion. Match indexes are coded using an order-0 model. QUAD uses a 16 MB dictionary. For selectable compression speed and ratio, QUAD uses different parsing schemes: with Normal mode (Default) QUAD uses a Lazy Matching; with Max mode (-x option) QUAD uses a variant of Flexible Parsing. In addition, QUAD has an E8/E9 transformer for better executable compression which is always enabled.

  3. #3
    Programmer
    Join Date
    Feb 2007
    Location
    Germany
    Posts
    420
    Thanks
    28
    Thanked 153 Times in 18 Posts
    Quick test on SFC with -x...

    0.853.520 A10.quad
    1.492.973 AcroRd32.quad
    0.828.018 english.quad
    3.823.899 FlashMX.quad
    0.619.701 FP.quad
    1.906.649 MSO97.quad
    0.842.299 ohs.quad
    1.003.856 rafale.quad
    0.680.623 vcfiu.quad
    0.595.545 world95.quad

    TOTAL = 12.647.083 Bytes

    It compresses better than 1.07b (12.685.013 - taken from MC) and it seems to be faster, again! Awesome work Ilia!

    Quote Originally Posted by encode
    However, looks like Werner Bergmans was kidnapped by aliens...
    Im looking forward to the next MC update, too.

    @Matt:
    If you find some time itd be great if you could add CCM 1.20a to the "Large Text Compression Benchmark".

  4. #4
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Thanks Ilia!

  5. #5
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Quick test with ENWIK8...


    Compression results:

    QUAD v1.10

    Compression Time = 00:02:22.203

    Compressed Size = 29,731,541 bytes


    QUAD v1.10 -x

    Compression Time = 00:04:06.250

    Compressed Size = 29,152,166 bytes


    QUAD v1.11

    Compression Time = 00:02:19.734

    Compressed Size = 29,649,604 bytes


    QUAD v1.11 -x

    Compression Time = 00:04:12.406

    Compressed Size = 29,110,519 bytes



    Decompression results:

    QUAD v1.10

    Decompression Time = 00:00:27.969
    (Compressed DEFAULT mode)

    Decompression Time = 00:00:27.750
    (Compressed MAX mode)


    QUAD v1.11

    Decompression Time = 00:00:30.594
    (Compressed DEFAULT mode)

    Decompression Time = 00:00:30.344
    (Compressed MAX mode)

  6. #6
    Tester
    Stephan Busch's Avatar
    Join Date
    May 2008
    Location
    Bremen, Germany
    Posts
    876
    Thanks
    471
    Thanked 175 Times in 85 Posts
    The Alien on Werner's home is a brand new computer which keeps him busy.. And Matt is most probably busy with teaching his class..
    I will test Quad 1.11 today and if you want me I could also let CCM run on Matt's large text corpus.
    In my free time I am also gaming, but not the new generation games..
    Somebody remember the good old X-COM? There's a revival of that game at: www.xforce-online.com and it's pretty nice

  7. #7
    Programmer
    Join Date
    Feb 2007
    Location
    Germany
    Posts
    420
    Thanks
    28
    Thanked 153 Times in 18 Posts
    Nice compression improvement on ENWIK8 in both modes!

    Quote Originally Posted by LovePimple
    QUAD v1.10
    Decompression Time = 00:00:27.969 (Compressed DEFAULT mode)
    Decompression Time = 00:00:27.750 (Compressed MAX mode)
    QUAD v1.11
    Decompression Time = 00:00:30.594 (Compressed DEFAULT mode)
    Decompression Time = 00:00:30.344 (Compressed MAX mode)
    This is strange. I tested QUAD 1.10 and 1.11 decompression speed on the same file (ENWIK8 created with 1.11 -x). And on my system the picture is similar.
    -1.10 takes 12.984s
    -1.11 takes 13.578s
    Have there been any changes to the decompression code? Perhaps this is a compiler-related problem.


    Quote Originally Posted by Stephan Busch
    if you want me I could also let CCM run on Matts large text corpus.
    Thanks a lot Stephan, but theres no need to - Nania already posted results of ENWIK9 in the other thread. Its just that over at Matts site 1.1.2a is the last tested version - and its so ...err... dated!?

  8. #8
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,503
    Thanks
    741
    Thanked 665 Times in 359 Posts
    i've just read 1.10 sources and understood some things, mainly in LZ part because it's area where i worked primarily. it will be great to have more comments, after all

    so, what can be done:

    1) lazy matching: save match found for the next algorithm step. this should make algorithm about 1.5 times faster. the best scheme to do it is one expoited in zip's deflate() function

    2) try MIN_MATCH=2, probably with a limited range of possible indexes (i.e. drop matches with len=2 and, say, index>4)

    3) start checking next match with len'th byte where len is the size of already found match. this should reduce number of cache misses. of course, if the match succeds, you will need to recheck previous bytes

    4) save 4 bytes of each string in tab. of course, this will double tab size, but in return you will got 2.5 times less number of cache misses

    5) make N and TABSIZE a program's parameters and add 3rd match finder that don't uses lazy matching to give users wider selection of speed/memory/compression

    6) make window sliding and read each time, say, 1/8th of it

  9. #9
    Programmer
    Join Date
    Feb 2007
    Location
    Germany
    Posts
    420
    Thanks
    28
    Thanked 153 Times in 18 Posts
    Quote Originally Posted by Bulat Ziganshin
    1) lazy matching: save match found for the next algorithm step. this should make algorithm about 1.5 times faster. the best scheme to do it is one expoited in zips deflate() function
    QUAD does not store the results of already made searches. In some cases the algorithm does the same searches over and over again. But the question is how common these bad cases are and if the additional logic even hurts speed on some files. Additionally, some results may even get invalid because their index cant be seen anymore at a later time (shifted out of the ring buffer). But since this doesnt break compatibility its definitely worth a try.

    Quote Originally Posted by Bulat Ziganshin
    3) start checking next match with lenth byte where len is the size of already found match. this should reduce number of cache misses. of course, if the match succeds, you will need to recheck previous bytes
    I think QUAD already does this.
    Quote Originally Posted by Source
    ...if (buf[p + maxlen] != buf[i + maxlen]) continue;...
    Quote Originally Posted by Bulat Ziganshin
    4) save 4 bytes of each string in tab. of course, this will double tab size, but in return you will got 2.5 times less number of cache misses
    This is a great idea! You should try it.

    Quote Originally Posted by Bulat Ziganshin
    2,5 & 6
    These brake compatibility. We should respect Ilias decision.

  10. #10
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,503
    Thanks
    741
    Thanked 665 Times in 359 Posts
    about 1) - it shouldn't create problems if it will be done as in zip. because compression engine spend all the time finding matches i think that it may make program significantly faster. but, otoh, all data are cached after first try, so speed increase may be not so good

    7) prefetch data that will be used later. for example, starting to check i'th match, read first byte of i+1 match and then discard it. something like this is done in PPMD and increase speed by several percents, according to it's docs. newest processors even support special command to prefetch data without further using it


    about weightning matches. it should be not too complex: for each char you can find how many bits it uses for encoding. the same applies to match length and index. although PPM encoding will complicate things..

  11. #11
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Quote Originally Posted by Christian
    Have there been any changes to the decompression code? Perhaps this is a compiler-related problem.
    Decompressor is exactly the same. Note that on my machine and with my files - new 1.11 is really faster than 1.10! However, probably the catch is in the r[] variable - with 1.10 r[] is a local variable, with 1.11 r[] is declared as a global variable... But this thing can be hardware related...

    Quote Originally Posted by Bulat Ziganshin
    1) lazy matching: save match found for the next algorithm step. this should make algorithm about 1.5 times faster. the best scheme to do it is one expoited in zips deflate() function
    Not really. I tried many variants and chose the best. For example, the 1.11 uses the checkmatch() function which checks for larger match only - its work faster than actual best match length search - check for sources. Note that MUCH more often we do not discart current match - so no reason to remember the next match... Thats it!

    Check the 1.11 sources!

    Again, about storing the founded matches. At each step, the table with offsets is changes, so we must look for matches at each step - to be more accurate.

    Quote Originally Posted by Bulat Ziganshin
    about weightning matches. it should be not too complex: for each char you can find how many bits it uses for encoding. the same applies to match length and index. although PPM encoding will complicate things..
    I tried this - and this not works! Perhaps, I was somewhere wrong... I made many things - compute the actual cost of each choice (literal + match, lteral + literal + match, match + match - any sequences), plus how far we can jump, etc. Current scheme is the best...

  12. #12
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    And one thing - the actual speed boost depends on data. Waiting for Johan's results...

  13. #13
    Programmer
    Join Date
    Feb 2007
    Location
    Germany
    Posts
    420
    Thanks
    28
    Thanked 153 Times in 18 Posts
    Quote Originally Posted by encode
    Decompressor is exactly the same. I think the catch in compression...
    I compressed ENWIK8 with "1.11 -x". Then I decompressed this file with 1.10 and 1.11. In other words, I tested decompression speed on exactly the same file. Maybe its just a compiler switch.


    Anyway, dont listen to my nagging. This is just a minor thing. New QUADs great!

  14. #14
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Recheck the post - I changed it!

  15. #15
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,503
    Thanks
    741
    Thanked 665 Times in 359 Posts
    >At each step, the table with offsets is changes, so we must look for matches at each step - to be more accurate.

    just look at zip's sources.

    overall, when you have some match at current pos, you start to look for better match at next pos. if you don't find one, then current match encoded. if you find such match then you encode literal and match found is the best one can find at next pos - no need to repeat searching. i don't know how much faster it will be but definitely faster

    i propose you to make find_match function which have as one of its parameters length of match found so far - for "cold" match this parameter will be setup to 0 while for "warm" matches it will be set to length of existing match. smth like this:

    (len,index) = find_match(word16(i-2),0)
    if (len>=MIN_MATCH)
    (len1,index1) = find_match(word16(i-1),len)

    of course with all zip-like bells and whistles

    one more minor suggestion - make e8e9 not inlined

  16. #16
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Anyway, I hope soon I release a new test version of QUAD - if this one will be faster I release a new version (1.12). Probably, I should keep the same behavior as with 1.10:
    + Non circular table for compression. The multiple acces to such table should be faster.
    + Circular table for decompression, but with local r[] variable. In this case access to r[] will be faster.

    Thank you friends!

  17. #17
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    2 Bulat:
    Its done with 1.11 - look carefully at its sources...

    Quote Originally Posted by Bulat Ziganshin
    one more minor suggestion - make e8e9 not inlined
    Why?

  18. #18
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,503
    Thanks
    741
    Thanked 665 Times in 359 Posts
    >It's done with 1.11

    so, just use it in both cases

    >Why?

    to make compiled code smaller

  19. #19
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Quote Originally Posted by Bulat Ziganshin
    to make compiled code smaller
    Probably, the inlined function should work faster. For me the performance is the number one - the proram size doesnt matter anymore!

  20. #20
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    However, I will make a few test versions (including with and without inlined e8e9) of QUAD... Users will test - and the best will be chosen...

  21. #21
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    e8e9transform() invoked once at each 16 MB block of data.

  22. #22
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,503
    Thanks
    741
    Thanked 665 Times in 359 Posts
    ??? ??????

    >(len,index) = find_match(word16(i-2),0)

    actually, find_match(word16(i-2),MIN_MATCH-1) may be a bit faster because it fetches less memory lines

  23. #23
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Quote Originally Posted by Bulat Ziganshin
    actually, find_match(word16(i-2),MIN_MATCH-1) may be a bit faster because it fetches less memory lines
    I tried this - no speed improvement. Before match search we must check for correct bounds - e.g. if i = n - 1 then i + 2 = buffer overrun... Anyway, such improvement not make a sense...

    By the way, I carefully read the Deflate and LZMA source code...

  24. #24
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,503
    Thanks
    741
    Thanked 665 Times in 359 Posts
    >Non circular table for compression. The multiple acces to such table should be faster.

    it should be better to move r[x] out of cycles. for example:

    for (int m = 0; m < TABSIZE; m++) {
    int p = tab[x][(r[x] - m) & (TABSIZE - 1)];

    should be replaced with

    t = &tab[x]
    for (m=r[x],ix=TABSIZE+1; --ix; m=(m-1)%TABSIZE)
    int p = t[m]
    ...
    index = (r[x]-m) %TABSIZE

  25. #25
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    I tried this, but

    p = tab[x][m]

    works faster...

  26. #26
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    I think it's due to additional computations and additional access to the r[] array... (cache misses etc.)

  27. #27
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,503
    Thanks
    741
    Thanked 665 Times in 359 Posts
    >Deflate and LZMA source code...

    and in deflate lazy matching works, yes? how about using the same implementation?

    >I tried this - no speed improvement. Before match search we must check for correct bounds - e.g. if i = n - 1 then i + 2 = buffer overrun... Anyway, such improvement not make a sense...

    just allocate +MAX_MATCH bytes at buffer end and fill MAX_MATCH bytes after end of block read with zeroes. this will greatly simplidy all these bounds checking

    this change should lower number of memory accesses - because it's more probable that buf[p+2]!=buf[j+2] rather than buf[p]!=buf[j] where p is current position and j is position of some match to try

  28. #28
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,503
    Thanks
    741
    Thanked 665 Times in 359 Posts
    >I think it's due to additional computations and additional access to the r[] array... (cache misses etc.)

    this array is 64kb and it's possible that access to this array throws out other useful data

  29. #29
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Quote Originally Posted by Bulat Ziganshin
    and in deflate lazy match works, yes?
    Yes, but remember QUAD is not LZ77 - it has a different behavior - for example QUAD outputs much more literals than matches compared to the base-line LZ77. Thus tricks which work with LZ77 will not work with QUAD!

    Quote Originally Posted by Bulat Ziganshin
    this change should lower number of memory accesses - because its more probable that buf[p+2]!=buf[j+2] rather than buf[p]!=buf[j] where p is current position and j is position of some match to try
    Yepp, I also tried this variant - same speed. - I just compiled and measured the compression time with both variants...

  30. #30
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    The optimal parsing-related ideas - that's I'm searching for...

Page 1 of 2 12 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •