Results 1 to 11 of 11

Thread: Hutter Prize - Has anyone been able to test successfully Alexander Rhatushnyak claims

  1. #1
    Member
    Join Date
    Dec 2020
    Location
    Sofia, Bulgaria
    Posts
    7
    Thanks
    0
    Thanked 2 Times in 2 Posts

    Hutter Prize - Has anyone been able to test successfully Alexander Rhatushnyak claims

    Has anyone been able to test successfully Alexander Rhatushnyak claims? I can't run his program on my computer, and frankly I doubt that he was able to compress the enwik9 file to the size he claims with the standard compression algorithms available. And when I talk about an external test, I talk about test outside Hutter Prize founders Marcus Hutter, Jim Bowery and Matt Mahoney, who are interested parties.

  2. #2
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    4,138
    Thanks
    320
    Thanked 1,401 Times in 803 Posts
    I verified phda's enwik8 compression last year (on linux) - it matched.
    Didn't actually check enwik9, since it would take a week, but I don't see a reason to suspect it.
    phda doesn't even provide the best ratio currently - just fits the HP constraints.
    See http://mattmahoney.net/dc/text.html - cmix and nncp have better enwik9 results.
    There's nothing especially unique either - just all the useful open-source models and years of manual tuning
    and dictionary optimization.

    Btw, you can download it here: http://qlic.altervista.org/phda9.zip
    But make sure to read the read_me.txt - cmdline options are different for different input files.
    Also don't try decoding it on windows.

  3. #3
    Member
    Join Date
    Dec 2020
    Location
    Sofia, Bulgaria
    Posts
    7
    Thanks
    0
    Thanked 2 Times in 2 Posts
    So none was tested Alexander claims outside Hutter Prize for enwik9?

  4. #4
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    4,138
    Thanks
    320
    Thanked 1,401 Times in 803 Posts
    Just test it for yourself, or try posting this in https://groups.google.com/g/hutter-prize .
    Some people did test it (like Byron Knoll), but you'd have to wait very long for their reply here.

  5. #5
    Member
    Join Date
    Mar 2011
    Location
    USA
    Posts
    253
    Thanks
    116
    Thanked 127 Times in 73 Posts
    Quote Originally Posted by Shelwien View Post
    Just test it for yourself, or try posting this in https://groups.google.com/g/hutter-prize .
    Some people did test it (like Byron Knoll), but you'd have to wait very long for their reply here.
    You don't need to wait very long
    Yes, I tested enwik9 on my computer and it is valid: the decompressed file matches enwik9.

  6. #6
    Member
    Join Date
    Dec 2020
    Location
    Sofia, Bulgaria
    Posts
    7
    Thanks
    0
    Thanked 2 Times in 2 Posts
    years of manual tuning
    and dictionary optimization.
    What you mean under "MANUAL tuning
    and dictionary optimization"
    ? Does this mean that the compressor can use ready-made dictionaries?

    If so, this award is completely absurd. I can calculate in advance the optimal dictionary on very powerful machine, for example. This is a hidden calculation and I think it is a scam. If you have option for MANUAL tuning, the situation is not much different, as my manual interventions can be guided by this preliminary calculations too. WTF?

  7. #7
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    4,138
    Thanks
    320
    Thanked 1,401 Times in 803 Posts
    > What you mean under "MANUAL tuning and dictionary optimization"?

    See all the Alex' previous HP entries before phda.
    http://mattmahoney.net/dc/text.html#1440

    > Does this mean that the compressor can use ready-made dictionaries?

    Yes, and it does, but that's why program size is counted as part
    of compressed size for the contest.

    > If so, this award is completely absurd.

    Just read all the rules and pay more attention?

    > I can calculate in advance the optimal dictionary on very powerful machine, for example.

    Yes you can, and yes this option is already used in HP.
    But the size of compressed dictionary is added to final result (as part of decoder).

    > This is a hidden calculation and I think it is a scam.

    No, the rules are clearly explained.

    The main problem is that Alex invested 15 years of work into this,
    and already included all the useful open-source parts, while not opening
    the sources of his contest entries.

    This obviously makes it very hard to compete, especially for new people,
    but its not really impossible.

    > If you have option for MANUAL tuning, the situation is not much different,
    > as my manual interventions can be guided by this preliminary calculations too. WTF?

    If we already had a working AI, there'd be no point in this contest.

    As it is, its a competition between programmers on best design of a statistical model for prediction of wiki data.
    Yes, it may be possible to detect optimal values of some parameters in runtime, but there's a time limit,
    which is already hard to beat as it is, and any extra optimization algorithms would make the compressor slower.

  8. Thanks:

    kampaster (29th January 2021)

  9. #8
    Member
    Join Date
    Dec 2020
    Location
    Sofia, Bulgaria
    Posts
    7
    Thanks
    0
    Thanked 2 Times in 2 Posts
    Are you all idiots, here?!

    May be you want to use AI that was trained on this specific enwik9 text too?!


    Using on dictionaries which are created in advance is a SCAM. If the program used does not compress other text files with an approximate compression ratio of enwik9, the whole Hutter Prize loses all its significance as a means of stimulating compression research. As I said it is absurd.

    P.s. Ok. Now you will see. I don't need to adjust dictionaries manually because I know how to calculate them optimally.
    Last edited by Emil Enchev; 28th January 2021 at 14:30.

  10. #9
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    4,138
    Thanks
    320
    Thanked 1,401 Times in 803 Posts
    It may look controversial to you, but that's actually the point of this contest for Marcus Hutter.
    He invented this theory about AI development, called AIXI - https://en.wikipedia.org/wiki/AIXI
    And then created this contest to motivate people to prove it with their work.

    Until now, it didn't exactly go completely in the right direction - some technical problems of PC software
    certainly affect the results too much.
    But it did inspire development of better prediction algorithms too (NNCP etc).
    And for now there're no better rules for an open contest like that - if you think its easy to scam it, you can simply try.
    But better read the rules first.

  11. Thanks:

    kampaster (29th January 2021)

  12. #10
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    372
    Thanks
    133
    Thanked 57 Times in 40 Posts
    Quote Originally Posted by Emil Enchev View Post
    Are you all idiots, here?!

    May be you want to use AI that was trained on this specific enwik9 text too?!


    Using on dictionaries which are created in advance is a SCAM. If the program used does not compress other text files with an approximate compression ratio of enwik9, the whole Hutter Prize loses all its significance as a means of stimulating compression research. As I said it is absurd.

    P.s. Ok. Now you will see. I don't need to adjust dictionaries manually because I know how to calculate them optimally.
    Is there something specifically problematic about the text you highlighted? If so, I'm not following what it is. The lack of any repeated words?

    I understand your point about the ability to manually optimize dictionaries in advance for the specific dataset to be compressed. The resulting compressor won't be as good on other datasets, but it can still be useful for progress in data compression algorithms, and the dictionary size counts against the result, so... Also, compression using static dictionaries is quite common. Brotli uses a static dictionary (and a terrible one) that was based on a set of web files (mostly HTML I think), since it was designed to compress web files. We often know some things in advance about the data we're going to compress (SDCH is another example, where the dictionary was specific to the website using it).

  13. #11
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    4,138
    Thanks
    320
    Thanked 1,401 Times in 803 Posts
    DRT dictionary is actually more advanced (and AI-like) than all other dictionary preprocessing implementations.

    To make it clear, DRT/WRT (originally known as LIPT) is a LZ78-like idea to replace frequent words with shorter dictionary indices,
    but encode these dictionary indices with a limited charset, thus making them look like valid new words for paq wordmodel etc.

    Unlike most dynamic WRT implementations, the DRT static dictionary used and incrementally improved in Alex' HP entries (and cmix)
    has semantic clustering, so words with similar meaning end up having similar codes, which improves the quality of context-based predictions.

    https://github.com/byronknoll/cmix/t...ter/dictionary

    Its simply too slow for fully automatic optimization, so had to be generated and incrementally improved over long time
    using all kinds of heuristics.

  14. Thanks:

    kampaster (29th January 2021)

Similar Threads

  1. Hutter Prize update
    By Matt Mahoney in forum Data Compression
    Replies: 76
    Last Post: 3rd March 2021, 07:33
  2. Hutter Prize, 4.17% improvement is here
    By Alexander Rhatushnyak in forum Data Compression
    Replies: 91
    Last Post: 19th December 2020, 23:22
  3. Hutter prize awarded
    By Matt Mahoney in forum Data Compression
    Replies: 2
    Last Post: 19th August 2009, 22:17
  4. Alexander Rhatushnyak wins Hutter Prize!
    By LovePimple in forum Forum Archive
    Replies: 1
    Last Post: 5th November 2006, 19:04
  5. The Hutter Prize
    By LovePimple in forum Forum Archive
    Replies: 7
    Last Post: 22nd September 2006, 13:28

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •