http://cs.fit.edu/~mmahoney/compression/text.html
But it requires 64-bit Linux, 4 GB RAM + 1 GB swap. I only have 2 GB so I can't test it as benchmarked. It is tuned to LTCB and may not work with other files. Size includes a custom dictionary.
http://cs.fit.edu/~mmahoney/compression/text.html
But it requires 64-bit Linux, 4 GB RAM + 1 GB swap. I only have 2 GB so I can't test it as benchmarked. It is tuned to LTCB and may not work with other files. Size includes a custom dictionary.
Any chance of getting a real paq9 with results better than paq8?
Because this contest gets really funny, featuring durilca with
2002 engine + preprocessor and redundant due to speed optimization,
but still slow as hell, paq8... + preprocessor despite its built-in
text modelling (and I'd say there's not much improvement since paq6 in 2003,
for preprocessed text compression at least).
Nobody have tested paq8hp12any with 4 GB memory usage. Would be interesting to see...
Great! Nice to see someone is able to squeeze this improvement from engine that's 6 years old![]()
> Great! Nice to see someone is able to squeeze this improvement from engine
> that's 6 years old
Actually there just wasn't much development in these years.
(Of course I mean compression quality, not speed or usability).
And also durilca uses external preprocessing, so it should be
possible to feed durilca's preprocessed data to paq8 and see
what happens... wonder if i should try disabling the compression
part... 64bit and linux complicate things though...
Ай-йяй-йяй!Originally Posted by Shelwien
It is simpler and faster to run DURILCA at lower memory:Originally Posted by IsName
./DURILCA e -t2 -m1600 -o10 enwik9
131839692 bytes 3708.41 sec.
./DURILCA e -t2 -m1800 -o10 enwik9
131505803 bytes 3644.65 sec.
> Ай-йяй-йяй!
Too lazy to do it anyway...
Btw, why not you just say what's paq performance with your preprocessed data? I bet you've tested it already![]()
It is slightly worse, results for enwik7 (I am too inpatient for larger files):Originally Posted by Shelwien
PAQ8n: 1786603 bytes
DURILCA: 1777587 bytes
For larger files, DURILCA will use statistics of higher orders and PAQ will not.
PAQ8hp try to preprocess already preprocessed file for this test.
try paq8o9 it has new apm.Originally Posted by Dmitry Shkarin
paq has match model which reduces somewhat the need for high order statistics.Originally Posted by Dmitry Shkarin
bit based and hash based approach like in paq makes it inefficient for stationary low noise data like text, but its better for mixed data types like hlp files:
http://www.maximumcompression.com/data/hlp.php
also paq does better when memory is constrained.