Version 1.2 of phda9 is here: http://qlic.altervista.org/phda9.zip
Memory usage is a bit lower, speed is approximately the same, plus or minus 5%,
enwik9 compressed size is expected to be 118'335'xxx bytes.
You can specify an external dictionary with up to 188240 words.
Any symbol with ASCII code 32 or higher is allowed in a word.
As usual, see read_me.txt for more info.
This newsgroup is dedicated to image compression:
http://linkedin.com/groups/Image-Compression-3363256
byronknoll (21st March 2018),Matt Mahoney (28th March 2018),xinix (19th March 2018)
Here are the results for version 1.2:
enwik9 compressed size: 118335817 bytes
size of decompression program in .zip: 42745 bytes
total size (compressed file + decompression program): 118378562 bytes
compression time: 60726.516 seconds
decompression time: 61586.611 seconds
compression memory: 4995112 KiB
decompression memory: 4992932 KiB
enwik8 compressed size: 15144786 bytes
size of decompression program in .zip: 581133 bytes
total size (compressed file + decompression program): 15725919 bytes
compression time: 6471.885 seconds
decompression time: 6498.773 seconds
compression memory: 3797160 KiB
decompression memory: 3833516 KiB
Description of test machine:
processor: Intel Core i7-7700K
memory: 32GB DDR4
OS: Ubuntu 16.04
Alexander Rhatushnyak (21st March 2018),Matt Mahoney (28th March 2018)
Updated LTCB. Sorry for delay but I've been busy lately. http://mattmahoney.net/dc/text.html#1183
Alexander Rhatushnyak (22nd April 2018),encode (22nd April 2018)
@Matt - could you also update paq8pxd score? Here is a link with full info: https://encode.su/threads/1464-Paq8p...ll=1#post56169
@Matt - and my second request (I've send you it recently):
There is an error in on LTCB page in CMVE row in Totali size column:
129,876,858 - enwik9 compressed size - it's OK
307'787 - decompressor size zipped by you - it's OK
130'301'106 - Total size (sum of two rows above) - it's not OK - instead of these should be 130'184'645 - proper size of summary packed file and decompressor. Could you correct it?
Version 1.3 is here: http://qlic.altervista.org/phda9.zip
None of the four executables was tested,
hopefully they will work as expected.
Compressed enwik9 size is expected to be 117'617'185 bytes,
and 119'591'248 bytes for the executable without LSTM.
Compression time should be ~87400 seconds on Byron's desktop.
This newsgroup is dedicated to image compression:
http://linkedin.com/groups/Image-Compression-3363256
byronknoll (29th April 2018),comp1 (23rd April 2018),Mike (22nd April 2018)
Here are the results for version 1.3:
enwik9 compressed size: 117617185 bytes
size of decompression program in .zip: 42108 bytes
total size (compressed file + decompression program): 117659293 bytes
compression time: 86557.679 seconds
decompression time: 87375.163 seconds
compression memory: 4996508 KiB
decompression memory: 4993916 KiB
enwik8 compressed size: 15069752 bytes
size of decompression program in .zip: 557050 bytes
total size (compressed file + decompression program): 15626802 bytes
compression time: 9022.053 seconds
decompression time: 9190.132 seconds
compression memory: 3799092 KiB
decompression memory: 3835372 KiB
Description of test machine:
processor: Intel Core i7-7700K
memory: 32GB DDR4
OS: Ubuntu 16.04
Alexander Rhatushnyak (29th April 2018),Darek (30th April 2018),Matt Mahoney (1st May 2018)
Is really enwik8 decompression program 13 times bigger than for enwik9? Why is that?
This newsgroup is dedicated to image compression:
http://linkedin.com/groups/Image-Compression-3363256
Darek (30th April 2018)
This newsgroup is dedicated to image compression:
http://linkedin.com/groups/Image-Compression-3363256
@Matt - could you also add paq8pxd v47 newest score to LTCB?
Version 1.4 is here: http://qlic.altervista.org/phda9.zip
This is mainly a bug fixing release: version 1.3 failed on
external dictionaries with less than 188240 lines with words.
As before, none of the four executables was tested,
hopefully they will work as expected.
This newsgroup is dedicated to image compression:
http://linkedin.com/groups/Image-Compression-3363256
Matt Mahoney (20th May 2018),Mike (17th May 2018)
Here are the results for version 1.4:
enwik9 compressed size: 117603125 bytes
size of decompression program in .zip: 42110 bytes
total size (compressed file + decompression program): 117645235 bytes
compression time: 87520.714 seconds
decompression time: 87909.830 seconds
compression memory: 4995924 KiB
decompression memory: 4992944 KiB
enwik8 compressed size: 15074624 bytes
size of decompression program in .zip: 557096 bytes
total size (compressed file + decompression program): 15631720 bytes
compression time: 9237.343 seconds
decompression time: 9305.006 seconds
compression memory: 3799028 KiB
decompression memory: 3835320 KiB
Description of test machine:
processor: Intel Core i7-7700K
memory: 32GB DDR4
OS: Ubuntu 16.04
Alexander Rhatushnyak (20th May 2018),Darek (21st May 2018),Matt Mahoney (20th May 2018)
@Matt - is something wrong with my submissions to LTCB for paq8pxd_v47? I've requested it few times.
Version 1.5 is here: http://qlic.altervista.org/phda9.zip
As usual, four executables are barely tested,
hopefully they will work as expected.
Compressed size of enwik9 is probably 117223130.
This newsgroup is dedicated to image compression:
http://linkedin.com/groups/Image-Compression-3363256
Mike (1st August 2018)
hi
What about compressor size? any size and compression time is allowed?
Yes. Decompressor size also is taken into account only because otherwise its too easy to "compress" enwik8 to 0 bytes
by putting it into decoder. It can be even a fully valid lossless compressor otherwise, just with an "optimization" for this specific file.
Matt also tried to setup different rules, where test file is not known in advance, so there's no need to take into account the decoder size,
but it appeared to be prone to "exploits" (after a while a specific preprocessor was written, which improved compression by a lot).
http://mattmahoney.net/dc/uiq/
There also can be questions about decoder distribution (how to count standard libs like MSVCPxx.dll), but afaik you can discuss
specific terms in relation to that.
Mohammad (10th August 2018)
Here are the results for version 1.5:
enwik9 compressed size: 117223130 bytes
size of decompression program in .zip: 42428 bytes
total size (compressed file + decompression program): 117265558 bytes
compression time: 85877.820 seconds
decompression time: 86365.831 seconds
compression memory: 4995908 KiB
decompression memory: 4993488 KiB
enwik8 compressed size: 15063267 bytes
size of decompression program in .zip: 557415 bytes
total size (compressed file + decompression program): 15620682 bytes
compression time: 9258.027 seconds
decompression time: 9164.181 seconds
compression memory: 3799176 KiB
decompression memory: 3835412 KiB
Description of test machine:
processor: Intel Core i7-7700K
memory: 32GB DDR4
OS: Ubuntu 16.04
Change of rules today:
>Originally Posted by 2017-05-29
Originally Posted by 2018-08-18
xinix (19th August 2018)
If the purpose of the Hutter prize is to improve the state of compression knowledge in the world, why not require that algorithms used must be open source?
We cannot learn from Alexander Rhatushnyak's code if he only provides executables. And in the current state of the world, if we want to use the algorithm widely, it really must be open sourced so people can generate signed, trusted executables.
Gotty (6th October 2018)
Being open source doesn't mean the algorithm can be widely used - for example if the person has simultaneously filed a patent, you won't know it until the patent is published ~18 month later.
There was a recent rule change to the Hutter prize. Source code is now required. From the FAQ:
Why do you require submission of documented source code?
A primary goal of this contest is to increase awareness of the relation between compression and (artificial) intelligence, and to foster the development of better compressors. The (ideas and insights behind the) submitted (de)compressors should in turn help to create even better compressors and ultimately in developing smarter AIs. Up until 2017 the source code was not required for participation in the contest, and has also not been released voluntarily. The past submissions are therefore useless to others and the ideas in them may be lost forever. Furthermore this made it difficult for other contestants to beat the (as of 2017) four-time winner Alexander Rhatushnyak. Making the source available should rectify these problems. Therefore, as of 2018, the source code is required, which should help to revive the contest, make it easier to build improved compressors by combining ideas, foster collaboration, and ultimately lead to better AI. Contributors can still copyright their code or patent their ideas, as long as non-commercial use, and in particular use by other future contestants, is not restricted.
Does anyone have a download for the compressed file output of enwik8 or enwik9 (preferably both) that they can share as a download link? I'd like to test the compressed output file from the latest release for a few things to satisfy a few curiosities, but don't have the time (or a computer that will still love me in the morning without threatening to shut itself down) if I try to run it without that much power or time to compress it. Thanks in advance to anyone who still has the file available to share after the test.![]()
> Does anyone have a download for the compressed file output of enwik8 or enwik9 (preferably both) that they can share as a download link?
Its not available for most recent entries (no windows version), but you can try these:
https://sites.google.com/site/lossle...rch-site&q=zip
https://encode.su/threads/2858?p=55215&pp=1
> Making the source available should rectify these problems.
At this point they'd have to reboot the contest, because I don't think that Alex would ever post sources.
JamesWasil (7th October 2018),xinix (8th October 2018)
I don't know, but "reduce" for worry of million dollars lawsuits doesn't seem extremely comforting.
The original purpose of patent system was stimulating invention, now it can e.g. give 20 years monopoly for half of machine learning (all world works on) to a single company: http://ipkitten.blogspot.com/2018/06...t-filings.html
Version 1.6 is here: http://qlic.altervista.org/phda9.zip
As usual, four executables are barely tested,
hopefully they will work as expected.
Compressed size of enwik9 should be 117'039'xxx.
UPDATE: 117'039'346.
Last edited by Alexander Rhatushnyak; 22nd October 2018 at 14:00. Reason: Compressed size of enwik9
This newsgroup is dedicated to image compression:
http://linkedin.com/groups/Image-Compression-3363256
encode (21st October 2018)
First of all, you can learn, at the very least, what was achievable in 2018 with ~10000 lines of code by someone who could compress enwik9 just twice or thrice per day, on average, using two or three laptops.
Also, you can provide your own dictionary (details in read_me.txt if you wish) and learn a lot from that, maybe even improve compression quality.
This newsgroup is dedicated to image compression:
http://linkedin.com/groups/Image-Compression-3363256