Page 2 of 3 FirstFirst 123 LastLast
Results 31 to 60 of 80

Thread: Hutter Prize, 4.17% improvement is here

  1. #31
    Member
    Join Date
    Nov 2015
    Location
    -
    Posts
    46
    Thanks
    202
    Thanked 10 Times in 9 Posts
    Quote Originally Posted by Matt Mahoney View Post
    You are allowed to use all the cores and hyperthreads. My test machine has 2 cores and 4 hyoerthreads.
    Hi
    ===
    Ok, what time limit will we have for unpacking - using 4 cores?
    4 hours or 8 hours?

  2. #32
    Member Alexander Rhatushnyak's Avatar
    Join Date
    Oct 2007
    Location
    Canada
    Posts
    235
    Thanks
    38
    Thanked 85 Times in 46 Posts
    Version 1.2 of phda9 is here: http://qlic.altervista.org/phda9.zip
    Memory usage is a bit lower, speed is approximately the same, plus or minus 5%,
    enwik9 compressed size is expected to be 118'335'xxx bytes.

    You can specify an external dictionary with up to 188240 words.
    Any symbol with ASCII code 32 or higher is allowed in a word.
    As usual, see read_me.txt for more info.

    This newsgroup is dedicated to image compression:
    http://linkedin.com/groups/Image-Compression-3363256

  3. The Following 3 Users Say Thank You to Alexander Rhatushnyak For This Useful Post:

    byronknoll (21st March 2018),Matt Mahoney (28th March 2018),xinix (19th March 2018)

  4. #33
    Member
    Join Date
    Mar 2011
    Location
    USA
    Posts
    225
    Thanks
    106
    Thanked 106 Times in 65 Posts
    Here are the results for version 1.2:


    enwik9 compressed size: 118335817 bytes
    size of decompression program in .zip: 42745 bytes
    total size (compressed file + decompression program): 118378562 bytes
    compression time: 60726.516 seconds
    decompression time: 61586.611 seconds
    compression memory: 4995112 KiB
    decompression memory: 4992932 KiB


    enwik8 compressed size: 15144786 bytes
    size of decompression program in .zip: 581133 bytes
    total size (compressed file + decompression program): 15725919 bytes
    compression time: 6471.885 seconds
    decompression time: 6498.773 seconds
    compression memory: 3797160 KiB
    decompression memory: 3833516 KiB


    Description of test machine:
    processor: Intel Core i7-7700K
    memory: 32GB DDR4
    OS: Ubuntu 16.04

  5. The Following 2 Users Say Thank You to byronknoll For This Useful Post:

    Alexander Rhatushnyak (21st March 2018),Matt Mahoney (28th March 2018)

  6. #34
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Updated LTCB. Sorry for delay but I've been busy lately. http://mattmahoney.net/dc/text.html#1183

  7. The Following 2 Users Say Thank You to Matt Mahoney For This Useful Post:

    Alexander Rhatushnyak (22nd April 2018),encode (22nd April 2018)

  8. #35
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    936
    Thanks
    556
    Thanked 375 Times in 280 Posts
    Quote Originally Posted by Matt Mahoney View Post
    Updated LTCB. Sorry for delay but I've been busy lately. http://mattmahoney.net/dc/text.html#1183
    @Matt - could you also update paq8pxd score? Here is a link with full info: https://encode.su/threads/1464-Paq8p...ll=1#post56169

    @Matt - and my second request (I've send you it recently):


    There is an error in on LTCB page in CMVE row in Totali size column:
    129,876,858 - enwik9 compressed size - it's OK
    307'787 - decompressor size zipped by you - it's OK
    130'301'106 - Total size (sum of two rows above) - it's not OK - instead of these should be 130'184'645 - proper size of summary packed file and decompressor. Could you correct it?

  9. #36
    Member Alexander Rhatushnyak's Avatar
    Join Date
    Oct 2007
    Location
    Canada
    Posts
    235
    Thanks
    38
    Thanked 85 Times in 46 Posts
    Version 1.3 is here: http://qlic.altervista.org/phda9.zip
    None of the four executables was tested,
    hopefully they will work as expected.

    Compressed enwik9 size is expected to be 117'617'185 bytes,
    and 119'591'248 bytes for the executable without LSTM.
    Compression time should be ~87400 seconds on Byron's desktop.

    This newsgroup is dedicated to image compression:
    http://linkedin.com/groups/Image-Compression-3363256

  10. The Following 3 Users Say Thank You to Alexander Rhatushnyak For This Useful Post:

    byronknoll (29th April 2018),comp1 (23rd April 2018),Mike (22nd April 2018)

  11. #37
    Member
    Join Date
    Mar 2011
    Location
    USA
    Posts
    225
    Thanks
    106
    Thanked 106 Times in 65 Posts
    Here are the results for version 1.3:


    enwik9 compressed size: 117617185 bytes
    size of decompression program in .zip: 42108 bytes
    total size (compressed file + decompression program): 117659293 bytes
    compression time: 86557.679 seconds
    decompression time: 87375.163 seconds
    compression memory: 4996508 KiB
    decompression memory: 4993916 KiB


    enwik8 compressed size: 15069752 bytes
    size of decompression program in .zip: 557050 bytes
    total size (compressed file + decompression program): 15626802 bytes
    compression time: 9022.053 seconds
    decompression time: 9190.132 seconds
    compression memory: 3799092 KiB
    decompression memory: 3835372 KiB


    Description of test machine:
    processor: Intel Core i7-7700K
    memory: 32GB DDR4
    OS: Ubuntu 16.04

  12. The Following 3 Users Say Thank You to byronknoll For This Useful Post:

    Alexander Rhatushnyak (29th April 2018),Darek (30th April 2018),Matt Mahoney (1st May 2018)

  13. #38
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    936
    Thanks
    556
    Thanked 375 Times in 280 Posts
    Is really enwik8 decompression program 13 times bigger than for enwik9? Why is that?

  14. #39
    Member Alexander Rhatushnyak's Avatar
    Join Date
    Oct 2007
    Location
    Canada
    Posts
    235
    Thanks
    38
    Thanked 85 Times in 46 Posts
    Quote Originally Posted by Darek View Post
    Is really enwik8 decompression program 13 times bigger than for enwik9? Why is that?
    The main phda executable contains the default dictionary which is loosely compressed, together with the executable code.

    This newsgroup is dedicated to image compression:
    http://linkedin.com/groups/Image-Compression-3363256

  15. The Following User Says Thank You to Alexander Rhatushnyak For This Useful Post:

    Darek (30th April 2018)

  16. #40
    Member Alexander Rhatushnyak's Avatar
    Join Date
    Oct 2007
    Location
    Canada
    Posts
    235
    Thanks
    38
    Thanked 85 Times in 46 Posts
    Thank you Byron!

    Quote Originally Posted by byronknoll View Post
    Here are the results for version 1.2:
    ...
    compression time: 60726.516 seconds
    Quote Originally Posted by byronknoll View Post
    Here are the results for version 1.3:
    ...
    compression time: 86557.679 seconds
    Presumably it would be ~35000 seconds with phda9_no_lstm

    This newsgroup is dedicated to image compression:
    http://linkedin.com/groups/Image-Compression-3363256

  17. #41
    Member
    Join Date
    Apr 2018
    Location
    Indonesia
    Posts
    24
    Thanks
    7
    Thanked 4 Times in 4 Posts
    Quote Originally Posted by Alexander Rhatushnyak View Post
    Thank you Byron!



    Presumably it would be ~35000 seconds with phda9_no_lstm
    May you implement lstm in paq8pxd ?

  18. #42
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    936
    Thanks
    556
    Thanked 375 Times in 280 Posts
    @Matt - could you also add paq8pxd v47 newest score to LTCB?

  19. #43
    Member Alexander Rhatushnyak's Avatar
    Join Date
    Oct 2007
    Location
    Canada
    Posts
    235
    Thanks
    38
    Thanked 85 Times in 46 Posts
    Version 1.4 is here: http://qlic.altervista.org/phda9.zip
    This is mainly a bug fixing release: version 1.3 failed on
    external dictionaries with less than 188240 lines with words.

    As before, none of the four executables was tested,
    hopefully they will work as expected.

    This newsgroup is dedicated to image compression:
    http://linkedin.com/groups/Image-Compression-3363256

  20. The Following 2 Users Say Thank You to Alexander Rhatushnyak For This Useful Post:

    Matt Mahoney (20th May 2018),Mike (17th May 2018)

  21. #44
    Member
    Join Date
    Mar 2011
    Location
    USA
    Posts
    225
    Thanks
    106
    Thanked 106 Times in 65 Posts
    Here are the results for version 1.4:

    enwik9 compressed size: 117603125 bytes
    size of decompression program in .zip: 42110 bytes
    total size (compressed file + decompression program): 117645235 bytes
    compression time: 87520.714 seconds
    decompression time: 87909.830 seconds
    compression memory: 4995924 KiB
    decompression memory: 4992944 KiB

    enwik8 compressed size: 15074624 bytes
    size of decompression program in .zip: 557096 bytes
    total size (compressed file + decompression program): 15631720 bytes
    compression time: 9237.343 seconds
    decompression time: 9305.006 seconds
    compression memory: 3799028 KiB
    decompression memory: 3835320 KiB

    Description of test machine:
    processor: Intel Core i7-7700K
    memory: 32GB DDR4
    OS: Ubuntu 16.04

  22. The Following 3 Users Say Thank You to byronknoll For This Useful Post:

    Alexander Rhatushnyak (20th May 2018),Darek (21st May 2018),Matt Mahoney (20th May 2018)

  23. #45
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    936
    Thanks
    556
    Thanked 375 Times in 280 Posts
    @Matt - is something wrong with my submissions to LTCB for paq8pxd_v47? I've requested it few times.

  24. #46
    Member Alexander Rhatushnyak's Avatar
    Join Date
    Oct 2007
    Location
    Canada
    Posts
    235
    Thanks
    38
    Thanked 85 Times in 46 Posts
    Version 1.5 is here: http://qlic.altervista.org/phda9.zip
    As usual, four executables are barely tested,
    hopefully they will work as expected.
    Compressed size of enwik9 is probably 117223130.

    This newsgroup is dedicated to image compression:
    http://linkedin.com/groups/Image-Compression-3363256

  25. The Following User Says Thank You to Alexander Rhatushnyak For This Useful Post:

    Mike (1st August 2018)

  26. #47
    Member Mohammad's Avatar
    Join Date
    Aug 2018
    Location
    iran
    Posts
    2
    Thanks
    1
    Thanked 0 Times in 0 Posts
    hi

    What about compressor size? any size and compression time is allowed?

  27. #48
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,237
    Thanks
    192
    Thanked 968 Times in 501 Posts
    Yes. Decompressor size also is taken into account only because otherwise its too easy to "compress" enwik8 to 0 bytes
    by putting it into decoder. It can be even a fully valid lossless compressor otherwise, just with an "optimization" for this specific file.

    Matt also tried to setup different rules, where test file is not known in advance, so there's no need to take into account the decoder size,
    but it appeared to be prone to "exploits" (after a while a specific preprocessor was written, which improved compression by a lot).
    http://mattmahoney.net/dc/uiq/

    There also can be questions about decoder distribution (how to count standard libs like MSVCPxx.dll), but afaik you can discuss
    specific terms in relation to that.

  28. The Following User Says Thank You to Shelwien For This Useful Post:

    Mohammad (10th August 2018)

  29. #49
    Member Mohammad's Avatar
    Join Date
    Aug 2018
    Location
    iran
    Posts
    2
    Thanks
    1
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Shelwien View Post
    Yes. Decompressor size also is taken into account only because otherwise its too easy to "compress" enwik8 to 0 bytes
    by putting it into decoder. It can be even a fully valid lossless compressor otherwise, just with an "optimization" for this specific file.

    Matt also tried to setup different rules, where test file is not known in advance, so there's no need to take into account the decoder size,
    but it appeared to be prone to "exploits" (after a while a specific preprocessor was written, which improved compression by a lot).
    http://mattmahoney.net/dc/uiq/

    There also can be questions about decoder distribution (how to count standard libs like MSVCPxx.dll), but afaik you can discuss
    specific terms in relation to that.

    hi

    Thank you for your answer.

    i have an idea in data compression. i think my idea make better compression rate but maybe its need more time to compression data.

    Mr Alexander let me break your record

  30. #50
    Member
    Join Date
    Mar 2011
    Location
    USA
    Posts
    225
    Thanks
    106
    Thanked 106 Times in 65 Posts
    Here are the results for version 1.5:

    enwik9 compressed size: 117223130 bytes
    size of decompression program in .zip: 42428 bytes
    total size (compressed file + decompression program): 117265558 bytes
    compression time: 85877.820 seconds
    decompression time: 86365.831 seconds
    compression memory: 4995908 KiB
    decompression memory: 4993488 KiB

    enwik8 compressed size: 15063267 bytes
    size of decompression program in .zip: 557415 bytes
    total size (compressed file + decompression program): 15620682 bytes
    compression time: 9258.027 seconds
    decompression time: 9164.181 seconds
    compression memory: 3799176 KiB
    decompression memory: 3835412 KiB

    Description of test machine:
    processor: Intel Core i7-7700K
    memory: 32GB DDR4
    OS: Ubuntu 16.04

  31. #51
    Member
    Join Date
    May 2008
    Location
    France
    Posts
    79
    Thanks
    457
    Thanked 22 Times in 17 Posts
    Change of rules today:

    Quote Originally Posted by 2017-05-29
    [...]
    The above formula currently amounts to 1€ for every 330 byte improvement, with a minimum improvement of 494'449 bytes.
    [...]
    Rules may change at any time without notice to meet the goals of fairness, accuracy, maximizing public participation, and recognizing existing practice.
    [...]
    >

    Quote Originally Posted by 2018-08-18
    [...]
    The above formula currently amounts to 1€ for every ~300 byte improvement, with a minimum improvement of ~460KB.
    [...]
    Rules may change at any time without notice to meet the goals of fairness, accuracy, fostering progress and public participation, and recognizing existing practice.
    [...]

  32. The Following User Says Thank You to Mike For This Useful Post:

    xinix (19th August 2018)

  33. #52
    Member
    Join Date
    Oct 2018
    Location
    NY Metro Area
    Posts
    1
    Thanks
    0
    Thanked 1 Time in 1 Post

    Suggestion for rules?

    If the purpose of the Hutter prize is to improve the state of compression knowledge in the world, why not require that algorithms used must be open source?
    We cannot learn from Alexander Rhatushnyak's code if he only provides executables. And in the current state of the world, if we want to use the algorithm widely, it really must be open sourced so people can generate signed, trusted executables.

  34. The Following User Says Thank You to Dov Kruger For This Useful Post:

    Gotty (6th October 2018)

  35. #53
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    686
    Thanks
    215
    Thanked 214 Times in 131 Posts
    Being open source doesn't mean the algorithm can be widely used - for example if the person has simultaneously filed a patent, you won't know it until the patent is published ~18 month later.

  36. #54
    Member
    Join Date
    Mar 2011
    Location
    USA
    Posts
    225
    Thanks
    106
    Thanked 106 Times in 65 Posts
    Quote Originally Posted by Dov Kruger View Post
    If the purpose of the Hutter prize is to improve the state of compression knowledge in the world, why not require that algorithms used must be open source?
    There was a recent rule change to the Hutter prize. Source code is now required. From the FAQ:

    Why do you require submission of documented source code?

    A primary goal of this contest is to increase awareness of the relation between compression and (artificial) intelligence, and to foster the development of better compressors. The (ideas and insights behind the) submitted (de)compressors should in turn help to create even better compressors and ultimately in developing smarter AIs. Up until 2017 the source code was not required for participation in the contest, and has also not been released voluntarily. The past submissions are therefore useless to others and the ideas in them may be lost forever. Furthermore this made it difficult for other contestants to beat the (as of 2017) four-time winner Alexander Rhatushnyak. Making the source available should rectify these problems. Therefore, as of 2018, the source code is required, which should help to revive the contest, make it easier to build improved compressors by combining ideas, foster collaboration, and ultimately lead to better AI. Contributors can still copyright their code or patent their ideas, as long as non-commercial use, and in particular use by other future contestants, is not restricted.

  37. #55
    Member JamesWasil's Avatar
    Join Date
    Dec 2017
    Location
    Arizona
    Posts
    33
    Thanks
    29
    Thanked 6 Times in 6 Posts
    Does anyone have a download for the compressed file output of enwik8 or enwik9 (preferably both) that they can share as a download link? I'd like to test the compressed output file from the latest release for a few things to satisfy a few curiosities, but don't have the time (or a computer that will still love me in the morning without threatening to shut itself down) if I try to run it without that much power or time to compress it. Thanks in advance to anyone who still has the file available to share after the test.

  38. #56
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,237
    Thanks
    192
    Thanked 968 Times in 501 Posts
    > Does anyone have a download for the compressed file output of enwik8 or enwik9 (preferably both) that they can share as a download link?

    Its not available for most recent entries (no windows version), but you can try these:
    https://sites.google.com/site/lossle...rch-site&q=zip
    https://encode.su/threads/2858?p=55215&pp=1

    > Making the source available should rectify these problems.

    At this point they'd have to reboot the contest, because I don't think that Alex would ever post sources.

  39. The Following 2 Users Say Thank You to Shelwien For This Useful Post:

    JamesWasil (7th October 2018),xinix (8th October 2018)

  40. #57
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    692
    Thanks
    208
    Thanked 250 Times in 153 Posts
    Quote Originally Posted by Jarek View Post
    Being open source doesn't mean the algorithm can be widely used - for example if the person has simultaneously filed a patent, you won't know it until the patent is published ~18 month later.
    Wouldn't an apache license or a IP release reduce these worries to some extent?

  41. #58
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    686
    Thanks
    215
    Thanked 214 Times in 131 Posts
    I don't know, but "reduce" for worry of million dollars lawsuits doesn't seem extremely comforting.
    The original purpose of patent system was stimulating invention, now it can e.g. give 20 years monopoly for half of machine learning (all world works on) to a single company: http://ipkitten.blogspot.com/2018/06...t-filings.html

  42. #59
    Member Alexander Rhatushnyak's Avatar
    Join Date
    Oct 2007
    Location
    Canada
    Posts
    235
    Thanks
    38
    Thanked 85 Times in 46 Posts
    Version 1.6 is here: http://qlic.altervista.org/phda9.zip
    As usual, four executables are barely tested,

    hopefully they will work as expected.

    Compressed size of enwik9 should be 117'039'xxx.

    UPDATE: 117'039'346.
    Last edited by Alexander Rhatushnyak; 22nd October 2018 at 13:00. Reason: Compressed size of enwik9

    This newsgroup is dedicated to image compression:
    http://linkedin.com/groups/Image-Compression-3363256

  43. The Following User Says Thank You to Alexander Rhatushnyak For This Useful Post:

    encode (21st October 2018)

  44. #60
    Member Alexander Rhatushnyak's Avatar
    Join Date
    Oct 2007
    Location
    Canada
    Posts
    235
    Thanks
    38
    Thanked 85 Times in 46 Posts
    Quote Originally Posted by Dov Kruger View Post
    We cannot learn from Alexander Rhatushnyak's code if he only provides executables.
    First of all, you can learn, at the very least, what was achievable in 2018 with ~10000 lines of code by someone who could compress enwik9 just twice or thrice per day, on average, using two or three laptops.
    Also, you can provide your own dictionary (details in read_me.txt if you wish) and learn a lot from that, maybe even improve compression quality.

    This newsgroup is dedicated to image compression:
    http://linkedin.com/groups/Image-Compression-3363256

Page 2 of 3 FirstFirst 123 LastLast

Similar Threads

  1. Hutter Prize submission
    By Matt Mahoney in forum Data Compression
    Replies: 30
    Last Post: 26th October 2017, 20:29
  2. Hutter prize awarded
    By Matt Mahoney in forum Data Compression
    Replies: 2
    Last Post: 19th August 2009, 21:17
  3. Forum improvement
    By Lasse Reinhold in forum The Off-Topic Lounge
    Replies: 1
    Last Post: 13th May 2008, 16:48
  4. Alexander Rhatushnyak wins Hutter Prize!
    By LovePimple in forum Forum Archive
    Replies: 1
    Last Post: 5th November 2006, 18:04
  5. The Hutter Prize
    By LovePimple in forum Forum Archive
    Replies: 7
    Last Post: 22nd September 2006, 12:28

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •