Page 1 of 2 12 LastLast
Results 1 to 30 of 44

Thread: QUAD 1.11x is here! (a testbed version)

  1. #1
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Okay, here is the new testbed version. Please test in carefully for:
    + Decompression speed (compared to the 1.10 and 1.11)
    + Compression speed with both modes (again compared to 1.10 and 1.11)

    Thank you!

    Link:
    quad111x.zip (30 KB)


  2. #2
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Link #2:
    quad111x2.zip (30 KB)

    QUAD 1.11x2 is also here! This one has not inlined e8e9 - just to see is there a difference - or probably this one works even faster!

  3. #3
    Programmer
    Join Date
    Feb 2007
    Location
    Germany
    Posts
    420
    Thanks
    28
    Thanked 153 Times in 18 Posts
    Here you go Ilia.

    Compression Speed (ENWIK8 with -x):
    1.10: 88.1s
    1.11: 89.6s
    1.11x: 83.5s

    Compression Speed (ENWIK8 without -x):
    1.10: 46.3s
    1.11: 46.4s
    1.11x: 46.1s

    Decompression Speed (ENWIK8 compressed with 1.11 -x):
    1.10: 13.1s
    1.11: 13.6s
    1.11x: 13.1s

    I think you should keep 1.11x.

    Quote Originally Posted by encode
    QUAD 1.11x2 is also here! This one has not inlined e8e9
    Too late. But I dont think this will be slower/faster. I mean, the filter is used once every 16M.

  4. #4
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Thanks for testing!

  5. #5
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Link #3:
    quad-1.11xHASH.zip

    A very draft version with hashing (memory usage + 4MB). But on some files this one works two times faster!!!

  6. #6
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Link #4:
    quad-1.11HASH2.zip

    Improved version, works even faster with BOTH modes! (+ 8 MB usage)


  7. #7
    Tester
    Stephan Busch's Avatar
    Join Date
    May 2008
    Location
    Bremen, Germany
    Posts
    876
    Thanks
    471
    Thanked 175 Times in 85 Posts
    I can confirm an incredible speed boost of quad 1.11 hash2;
    the results will be online in an hour from now

    Cheers

  8. #8
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,503
    Thanks
    741
    Thanked 665 Times in 359 Posts
    my congratulations! now quad is definitely better than lzma:fast on text files! speed improvement over previous version is 1.5-2 times

  9. #9
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Quote Originally Posted by Stephan Busch
    the results will be online in an hour from now
    Thank you!

    Remember, this is only a beginning - I just started playing with hashing...

  10. #10
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Quick test...

    Test file: ENWIK8

    QUAD v1.11

    Compression Time = 00:02:19.734


    QUAD v1.11 HASH2

    Compression Time = 00:01:25.953



    QUAD v1.11 -x

    Compression Time = 00:04:12.406


    QUAD v1.11 HASH2 -x

    Compression Time = 00:02:28.125


    This HASH2 version has awesome speed!

  11. #11
    Programmer
    Join Date
    Feb 2007
    Location
    Germany
    Posts
    420
    Thanks
    28
    Thanked 153 Times in 18 Posts
    Awesome improvement, Ilia!

  12. #12
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts

  13. #13
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Quote Originally Posted by Matt Mahoney
    Thank you Matt!

    I can explain what Ive done - its all about memory access.

    During each search for the longest match, we need a random access to the buffer and this leads to the cache misses and unefficient memory use. For example:

    Assume current context is "ab".
    The table says that such context already appears at:
    4
    12524
    675467
    976579
    1065744
    etc.
    This offsets can looks just like random values.

    Okay, so now we need to access to each offset to check the match - but now you can see how far we may jump for each match search, even if we access to the one byte (primary match check).

    To minimize the access to the buffer, we can keep an additional table with followed to context bytes. This can be 2-bytes pair or hash or even 1-byte.

    In this case, at each match search, first we just check this tab for current N-bytes. If its match, we try to acces to the buffer to get the real match.

    So, for each context->offset stored in table, we keep the followed bytes or hash. This table updates in the same way as with tab[] - i.e. we just keep the list of recent items. And thats it!

    Currently Im experimenting to find the best approach. At the first sight, this thing is worth to add. And finally, QUAD became memory asymmetric - for compression we need a slightly larger number of bytes compared to the decompression. The QUAD 1.11HASH2 is symmetric is due to its experimental-draft nature...


  14. #14
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,503
    Thanks
    741
    Thanked 665 Times in 359 Posts
    some additional info:

    optimizing program, you always have some important measure you should minimize. for example, with SQL this may be amount of data fetched or queries issued

    for asm optimization, in old good days main measure was a cpu cycles. but nowadays before optimizing cpu cycles one should first optimize number of cache misses. this is especially important for LZ and most other comporession schemes which utilize many mbytes with rather random access

    one memory read costs ~100 cpu cycles, but cpu reads a whole cache line each time. this cache line is 16 bytes long (and, afair, 32 bytes in 64-bit mode) and of course aligned to 16-byte boundary. so, the question is how much 16-byte blocks your program touch

    with original scheme, your program reads 1.25 blocks for each checked match (even a bit more because checked strings may cross block boundary)

    with a scheme that saves 4 bytes of each match together with match index, you get 0.5 blocks read for each match plus blocks whose match length was >4

    by keeping match index and its hash together you can save a few memory accesses - i.e. you should make one array of structs instead of two separate arrays

    about selection of hash function - it's a field for wide experiments in particular, i suggest you to gather more statistics about typical match lengths. two ideas - first, you can use 3 bytes instead of 4 to store index. second - because your minimal match is 3 byte, you can store reduced hash of these bytes, it should be not so good for following bytes. hash formula may be the following:

    (value*789123456)>>24

  15. #15
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    I have an idea to get the clever usage of the current tab[] - keep the actual offset + 8-bit hash or one followed byte in each element.

    int p = tab[x][m] & 0xffffff;
    int hash = tab[x][m] >> 24;

    or something like that...

    About hash, I like the simple schemes:

    ((c >> X) ^ c) & (N - 1)

    This one works faster. In addition, we can use a modification of CRC32 / CRC16 for hashing.

  16. #16
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,503
    Thanks
    741
    Thanked 665 Times in 359 Posts
    my hashing schemes combines speed and good bit shuffling. with xor, each hash bit depends only on two original bits

    it should be better to keep index and any cached bytes together - this way you got 1 memory access instead of 2

  17. #17
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Quote Originally Posted by Bulat Ziganshin
    it should be better to keep index and any cached bytes together - this way you got 1 memory access instead of 2
    What do you mean?

    Currently, I get the super thing! I keep the offset + 8-bit hash together - memory usage is the same as with 1.10 but compression speed is even faster than 1.11HASH2...

  18. #18
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Also I hash two bytes to 8-bit hash. What do you think, which hash function better?

    I think I'll keep the 8-bit hash. hashing 3-bytes may lead to many hash collisions...

  19. #19
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,503
    Thanks
    741
    Thanked 665 Times in 359 Posts
    >Also I hash two bytes to 8-bit hash. What do you think, which hash function better?

    my

    >I think I'll keep the 8-bit hash. hashing 3-bytes may lead to many hash collisions...

    anyway you need only strings with the same first 3 bytes

    >What do you mean?

    isntead of making *separate* array for hash data, it's better to keep hashes together with indexes, so they will lie in the same cache line

  20. #20
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Quote Originally Posted by Bulat Ziganshin
    isntead of making *separate* array for hash data, its better to keep hashes together with indexes, so they will lie in the same cache line
    Read post below - I already done that...

  21. #21
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Yepp, your hash works better! But probably I must make some experiments with the MAGIC number.

    AFAIK, it must be some large prime number... So I must experiment to get the best number for hashing of 2-bytes...

  22. #22
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    THIS IS CRAZY!!!

    I make the better hash function and I get VERY serious speed boost!!!!!!!!!!!!!!!!!!!!!!

    inline unsigned int gethash(int i) { // DEBUG
    return ((*(reinterpret_cast<unsigned short *>(
    &buf[i])) * 987660757) >> 24);
    }

    The new magic number taken from PAQ1 and it works MUCH better! I can't beleive!


  23. #23
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    However, the performance is depent on data type. On fp.log anf reaktor.exe this magic number gives fantastic speed improvement, but on enwik8 this number gives no improvement (compared to the Bulat's magic). Anyway, compared to 1.11HASH2 this new version is indeed faster!

    Continue digging...

  24. #24
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    inline unsigned int gethash(int i) { // FIXME
    return ((*(reinterpret_cast<unsigned short *>(
    &buf[i])) * 789123456) >> 24);
    }

    Well, after some tuning and testing Bulat's magic becomes the best! Thank you Bulat! Just one question why 789123456 ?

  25. #25
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Quote Originally Posted by encode
    Well, after some tuning and testing Bulats magic becomes the best!


    Are you going to release another awesome testbed version before the final release?

  26. #26
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Quote Originally Posted by LovePimple
    Are you going to release another awesome testbed version before the final release?
    I do!

  27. #27
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,503
    Thanks
    741
    Thanked 665 Times in 359 Posts
    just random 32-bit number. actually you should see at it's binary representation and check that there are about sixteen 1's and that there are no (too much) regular patterns like 111111 or 101010. as i said, this operation just shuffles the bits

    don't forget to use entire 3 bytes for hash:

    return ((*(ulong*)(buf+i)&0xffffff)*789123456) >> 24;

  28. #28
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Quote Originally Posted by encode
    I do!
    Excellent!!!

  29. #29
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,503
    Thanks
    741
    Thanked 665 Times in 359 Posts
    ? ???? ?? ?????? ???????????? ??????? ?????????? - ?????? ??????? ????? oder-0 ?????????? ???-????????, ?????????? ??????? ????????? ?? ????? ???????? ?????? ?? ???????? ??????? ??????. ??? ?? ?????????, ????? ????????? ???-??????? ?????? ?? ?????? ?????????

  30. #30
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Quote Originally Posted by Bulat Ziganshin
    dont forget to use entire 3 bytes for hash:
    I tried this, and this may lead some tiny speed boost with text files and some slowdown with binary data - due to hash collisions. So, currently I keep just two bytes - since its better in average...

    Quote Originally Posted by Bulat Ziganshin
    ? ???? ?? ?????? ???????????? ??????? ?????????? - ?????? ??????? ????? oder-0 ?????????? ???-????????, ?????????? ??????? ????????? ?? ????? ???????? ?????? ?? ???????? ??????? ??????. ??? ?? ?????????, ????? ????????? ???-??????? ?????? ?? ?????? ?????????
    Will run some tests!


Page 1 of 2 12 LastLast

Similar Threads

  1. PAQ TestBed Set
    By Skymmer in forum Download Area
    Replies: 0
    Last Post: 11th July 2009, 23:08
  2. QUAD 1.12 - the FINAL version is here!
    By encode in forum Forum Archive
    Replies: 17
    Last Post: 10th April 2007, 10:21

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •