Page 3 of 3 FirstFirst 123
Results 61 to 78 of 78

Thread: Hutter Prize, 4.17% improvement is here

  1. #61
    Member Alexander Rhatushnyak's Avatar
    Join Date
    Oct 2007
    Location
    Canada
    Posts
    232
    Thanks
    38
    Thanked 80 Times in 43 Posts
    "Also, you can provide your own dictionary ..." -- that was a 3-minute answer, and here's a 30-minute:

    Why would I need Someone Else's code if I were young and smart and looking for a problem to solve, a challenge to take?
    Doesn't the fact that fast and small and simple SSE is still pretty useful (after all the enormous work performed by cmix and similar)
    tell you that PAQ-like algorithms (including phda and cmix) don't recognize any patterns in the
    model_number / probability_from_previous_models two-dimensional matrix?
    They only have a one-dimensional view of it, first horizontal, then vertical.
    But your algorithm does not have to look at this matrix at all !
    Come on, guys, forget about phda and cmix, invent something radically new!
    Like BWT and ANS were amazingly new solutions to old and boring problems.
    And for super-brief introductions to a couple problems that look really important,
    search for my last name on youtube.com (2nd half of the talk)

    This newsgroup is dedicated to image compression:
    http://linkedin.com/groups/Image-Compression-3363256

  2. #62
    Member
    Join Date
    Mar 2011
    Location
    USA
    Posts
    224
    Thanks
    106
    Thanked 106 Times in 65 Posts
    Here are the results for version 1.6:


    enwik9 compressed size: 117039346 bytes
    size of decompression program in .zip: 41911 bytes
    total size (compressed file + decompression program): 117081257 bytes
    compression time: 84713.758 seconds
    decompression time: 88401.702 seconds
    compression memory: 4996672 KiB
    decompression memory: 4993904 KiB


    enwik8 compressed size: 15040647 bytes
    size of decompression program in .zip: 564616 bytes
    total size (compressed file + decompression program): 15605263 bytes
    compression time: 8946.996 seconds
    decompression time: 8904.394 seconds
    compression memory: 3799820 KiB
    decompression memory: 3836272 KiB


    Description of test machine:
    processor: Intel Core i7-7700K
    memory: 32GB DDR4
    OS: Ubuntu 16.04

  3. The Following User Says Thank You to byronknoll For This Useful Post:

    Alexander Rhatushnyak (24th October 2018)

  4. #63
    Member Alexander Rhatushnyak's Avatar
    Join Date
    Oct 2007
    Location
    Canada
    Posts
    232
    Thanks
    38
    Thanked 80 Times in 43 Posts
    The set of enwik-specific transforms
    from the 2017 Hutter Prize winning entry
    goes open-source today.
    Archive is attached, read_me is included.
    Attached Files Attached Files

    This newsgroup is dedicated to image compression:
    http://linkedin.com/groups/Image-Compression-3363256

  5. The Following 9 Users Say Thank You to Alexander Rhatushnyak For This Useful Post:

    byronknoll (3rd January 2019),comp1 (3rd January 2019),Cyan (3rd January 2019),Darek (3rd January 2019),encode (3rd January 2019),Jyrki Alakuijala (3rd January 2019),Mike (3rd January 2019),milky (6th January 2019),xinix (3rd January 2019)

  6. #64
    Member
    Join Date
    Dec 2011
    Location
    Cambridge, UK
    Posts
    458
    Thanks
    143
    Thanked 158 Times in 106 Posts
    What size difference do these transforms make to phda? I'm curious to know how well it does as a generic text compressor vs dedicated to this one corpus.

    I guess the flip-side of this is to test what these transforms do to the performance of e.g. cmix. It's already matching phda9 for size, albeit considerably off for CPU and memory. It'd take a long while to know though!

    Anyway, thanks for making them public.

  7. #65
    Member Alexander Rhatushnyak's Avatar
    Join Date
    Oct 2007
    Location
    Canada
    Posts
    232
    Thanks
    38
    Thanked 80 Times in 43 Posts
    Quote Originally Posted by byronknoll View Post
    Here are the results for version 1.6:
    ...
    enwik8 compressed size: 15040647 bytes
    If you compress preprocessed_enwik8 instead of enwik8: 15015895 bytes.
    The difference is much bigger in the compressor that may use only 1 GiB of RAM.

    This newsgroup is dedicated to image compression:
    http://linkedin.com/groups/Image-Compression-3363256

  8. The Following User Says Thank You to Alexander Rhatushnyak For This Useful Post:

    JamesB (4th January 2019)

  9. #66
    Member
    Join Date
    Mar 2011
    Location
    USA
    Posts
    224
    Thanks
    106
    Thanked 106 Times in 65 Posts
    @Alex, thanks for posting these transforms!

    Quote Originally Posted by JamesB View Post
    I guess the flip-side of this is to test what these transforms do to the performance of e.g. cmix. It's already matching phda9 for size, albeit considerably off for CPU and memory. It'd take a long while to know though!
    cmix currently compresses enwik8 to 14965334.
    preprocessed_enwik8 compresses to 14953755.

  10. #67
    Member Alexander Rhatushnyak's Avatar
    Join Date
    Oct 2007
    Location
    Canada
    Posts
    232
    Thanks
    38
    Thanked 80 Times in 43 Posts
    36445248 enwik8.gz --- 103.44%
    35233985 preprocessed_enwik8.gz

    29008758 enwik8.bz2 --- 101.78%
    28502591 preprocessed_enwik8.bz2

    24861205 enwik8.7z --- 101.41%
    24515695 preprocessed_enwik8.7z

    gzip 1.6 -9
    bzip2 1.0.6 -9
    7z 9.20 -t7z -mx=9


    Update:
    Also, with phda9 the size of enwik8 before DRT decreases by ~5.1% (as in the read_me),
    but size after DRT decreases by ~8.36% : 60520510 ==> 55852090 bytes.
    Last edited by Alexander Rhatushnyak; 7th January 2019 at 04:36. Reason: Update

    This newsgroup is dedicated to image compression:
    http://linkedin.com/groups/Image-Compression-3363256

  11. #68
    Member
    Join Date
    Mar 2011
    Location
    USA
    Posts
    224
    Thanks
    106
    Thanked 106 Times in 65 Posts
    Using the instructions in enwik8preproc.zip, I extracted the two files from the phda November 2017 release:
    1) preprocessed enwik8
    2) phda dictionary

    I tried testing these with cmix:
    cmix currently compresses enwik8 to 14947555 bytes.
    phda preprocessed enwik8 compresses to 14866787 bytes.
    cmix using the phda dictionary compresses enwik8 to 14878492 bytes.

    The dictionary used by phda is amazing! @Alex, I am hoping to use this dictionary for cmix - let me know if this is not OK. Recently I have been focusing on the DRT, including trying to improve the dictionary. I am learning a lot through trying different experiments, but I think I am a long way from being able to generate a dictionary even close to this level of performance.

  12. #69
    Member Alexander Rhatushnyak's Avatar
    Join Date
    Oct 2007
    Location
    Canada
    Posts
    232
    Thanks
    38
    Thanked 80 Times in 43 Posts
    Quote Originally Posted by byronknoll View Post
    I am hoping to use this dictionary for cmix - let me know if this is not OK.
    OK, but if you tried building a dictionary by yourself, that would be even better

    Quote Originally Posted by byronknoll View Post
    phda preprocessed enwik8 compresses to 14866787 bytes.
    cmix using the phda dictionary compresses enwik8 to 14878492 bytes.
    So the gain from enwik-specific preprocessing is 11705 bytes,
    very close to what was observed earlier:
    cmix currently compresses enwik8 to 14965334.
    preprocessed_enwik8 compresses to 14953755.

    This newsgroup is dedicated to image compression:
    http://linkedin.com/groups/Image-Compression-3363256

  13. The Following User Says Thank You to Alexander Rhatushnyak For This Useful Post:

    byronknoll (23rd January 2019)

  14. #70
    Member
    Join Date
    Mar 2011
    Location
    USA
    Posts
    224
    Thanks
    106
    Thanked 106 Times in 65 Posts
    Quote Originally Posted by Alexander Rhatushnyak View Post
    OK, but if you tried building a dictionary by yourself, that would be even better
    Thanks! Yeah, I will definitely continue trying to build a dictionary and improve DRT.

  15. #71
    Member
    Join Date
    May 2008
    Location
    France
    Posts
    79
    Thanks
    449
    Thanked 22 Times in 17 Posts
    Quote Originally Posted by Alexander Rhatushnyak View Post
    Version 1.6 is here: http://qlic.altervista.org/phda9.zip
    As usual, four executables are barely tested,

    hopefully they will work as expected.

    Compressed size of enwik9 should be 117'039'xxx.

    UPDATE: 117'039'346.
    New version from Alexander (same link)!

  16. #72
    Member Alexander Rhatushnyak's Avatar
    Join Date
    Oct 2007
    Location
    Canada
    Posts
    232
    Thanks
    38
    Thanked 80 Times in 43 Posts
    Version 1.7 is here: http://qlic.altervista.org/phda9.zip
    Improvement is below 0.1% on enwik9,
    but bigger on smaller text files,
    e.g. 0.4% on book1 from CC.

    This newsgroup is dedicated to image compression:
    http://linkedin.com/groups/Image-Compression-3363256

  17. The Following 3 Users Say Thank You to Alexander Rhatushnyak For This Useful Post:

    byronknoll (22nd February 2019),Jyrki Alakuijala (19th February 2019),Matt Mahoney (22nd February 2019)

  18. #73
    Member
    Join Date
    Mar 2011
    Location
    USA
    Posts
    224
    Thanks
    106
    Thanked 106 Times in 65 Posts
    Here are the results for version 1.7:


    enwik9 compressed size: 116940874 bytes
    size of decompression program in .zip: 43,274 bytes
    total size (compressed file + decompression program): 116984148 bytes
    compression time: 83712.733 seconds
    decompression time: 87596.519 seconds
    compression memory: 4999504 KiB
    decompression memory: 4996880 KiB


    enwik8 compressed size: 15023870 bytes
    size of decompression program in .zip: 565,352 bytes
    total size (compressed file + decompression program): 15589222 bytes
    compression time: 8907.486 seconds
    decompression time: 8868.092 seconds
    compression memory: 3802892 KiB
    decompression memory: 3838836 KiB


    Description of test machine:
    processor: Intel Core i7-7700K
    memory: 32GB DDR4
    OS: Ubuntu 18.04

  19. The Following 2 Users Say Thank You to byronknoll For This Useful Post:

    Alexander Rhatushnyak (24th February 2019),Matt Mahoney (22nd February 2019)

  20. #74
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts

  21. The Following 5 Users Say Thank You to Matt Mahoney For This Useful Post:

    Alexander Rhatushnyak (24th February 2019),avitar (24th February 2019),encode (22nd February 2019),Mike (22nd February 2019),schnaader (22nd February 2019)

  22. #75
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    682
    Thanks
    208
    Thanked 249 Times in 152 Posts
    Quote Originally Posted by Matt Mahoney View Post
    How do we get the large window brotli results on LTCB?

    These results have been available since 2016. https://groups.google.com/forum/m/#!...li/aq9f-x_fSY4

    LTCB reports 223'597'884 bytes

    Actually delivered with large window: 199'118'013 bytes

  23. #76
    Member Alexander Rhatushnyak's Avatar
    Join Date
    Oct 2007
    Location
    Canada
    Posts
    232
    Thanks
    38
    Thanked 80 Times in 43 Posts
    Version 1.8 is here: http://qlic.altervista.org/phda9.zip
    Memory usage is higher now, 6 GiB rather than 4.75.
    No other news.
    As usual, barely tested executables.

    This newsgroup is dedicated to image compression:
    http://linkedin.com/groups/Image-Compression-3363256

  24. The Following 2 Users Say Thank You to Alexander Rhatushnyak For This Useful Post:

    byronknoll (5th July 2019),Mike (5th July 2019)

  25. #77
    Member
    Join Date
    Mar 2011
    Location
    USA
    Posts
    224
    Thanks
    106
    Thanked 106 Times in 65 Posts
    Here are the results for version 1.8:


    enwik9 compressed size: 116544849 bytes
    size of decompression program in .zip: 42,944 bytes
    total size (compressed file + decompression program): 116587793 bytes
    compression time: 86182.993 seconds
    decompression time: 86305.520 seconds
    compression memory: 6319256 KiB
    decompression memory: 6316396 KiB


    enwik8 compressed size: 15010414 bytes
    size of decompression program in .zip: 558,298 bytes
    total size (compressed file + decompression program): 15568712 bytes
    compression time: 9162.225 seconds
    decompression time: 9258.572 seconds
    compression memory: 4800208 KiB
    decompression memory: 4836472 KiB


    Description of test machine:
    processor: Intel Core i7-7700K
    memory: 32GB DDR4
    OS: Ubuntu 18.04

  26. The Following 3 Users Say Thank You to byronknoll For This Useful Post:

    Alexander Rhatushnyak (11th July 2019),Darek (9th July 2019),Matt Mahoney (9th July 2019)

  27. #78
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts

  28. The Following 3 Users Say Thank You to Matt Mahoney For This Useful Post:

    Alexander Rhatushnyak (11th July 2019),Mike (10th July 2019),rarkyan (11th July 2019)

Page 3 of 3 FirstFirst 123

Similar Threads

  1. Hutter Prize submission
    By Matt Mahoney in forum Data Compression
    Replies: 30
    Last Post: 26th October 2017, 20:29
  2. Hutter prize awarded
    By Matt Mahoney in forum Data Compression
    Replies: 2
    Last Post: 19th August 2009, 21:17
  3. Forum improvement
    By Lasse Reinhold in forum The Off-Topic Lounge
    Replies: 1
    Last Post: 13th May 2008, 16:48
  4. Alexander Rhatushnyak wins Hutter Prize!
    By LovePimple in forum Forum Archive
    Replies: 1
    Last Post: 5th November 2006, 18:04
  5. The Hutter Prize
    By LovePimple in forum Forum Archive
    Replies: 7
    Last Post: 22nd September 2006, 12:28

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •