Can anyone compress enwik8 with ratio < 40% using LZ77 only with a simple hash table?
Can anyone compress enwik8 with ratio < 40% using LZ77 only with a simple hash table?
Given the previous times you've asked roughly the same, you need to specify exactly what you mean by "LZ77 only".
There is a whole range of options from actual LZ77 with a fixed size token for length and offset, over variable length integer encodings like LZ4, to ranges inside bytes like lzo1x, to bit-level encoding like LZSS uses for literal/match, to universal codes like Elias gamma.
PAQ8SK
this is forked from paq8pxv182fix1 with some tweaking on textmodel n increase the memory usage upto 7gb. the result for xml file is:
paq8pxv182fix1 250750 bytes
paq8sk 249237 bytes
enwik8 on progress
Last edited by lz77; 8th April 2020 at 13:44.
LittleBit (https://github.com/kapenga/LittleBit) can compress Enwik8 with a static Huffman tree to less than 40%. With a 1.5mb tree it could compress enwik8 to 26,5mb and that's including the tree. With a tree limited to 512kb it would be still below 40mb because the gains after a 512kb tree are small.
I think other methods should be possible too. It should be do-able.
But why would you want this?
I wrote for sports and academic interest pure and only LZ77 type compressor (while this is a prototype program for debugging) that for example beats blzpack -1 ... & blzpack -2 ... and approaching the zstd -1 ... So I want to compare my compressor with others that I may not know about.
I'm going to port one of my algorithms on FASM for Win64 to achieve maximum results. I would like to find buyers for my algorithms/sources.
Where I can download Windows binaries of LittleBit to compare with mine?
Kaw (11th April 2020)
Its java so you need to download the Java runtime stuff. On https://github.com/kapenga/LittleBit/releases you can find a release you can use with an example how to run it.
Two my simple frugal examples (while these are prototypes for debugging and improvements) comparing with famous leaders in data compression.
Benchmarked on Topton mini PC from Aiexpress: CPU I7 8850H, RAM 16 Gb @ 2.667 MHz,L1 cache 384 Kb, L2 1.5 Mb, L3 9 Mb.
I've boot the PC on 6 cores/12 logical CPUs with a speed 4.18 Ghz, RAM disk 2 Gb, pagefile.sys is off.
Some command lines:
timer.exe lz4x64_1.9.2.exe -1 -f -l --no-frame-crc enwik9
timer.exe lizard-1.0-win64.exe -30 -f -B7 --no-frame-crc enwik9
timer.exe zstd_1.4.4_win64.exe -1 -f --no-progress enwik9
Code:enwik9 compress user/process time | uncompress user/process time | ratio myex1 7.85/8.52 | 2.45/3.31 | 0.36454 myex2 6.36/7.06 | 2.02/2.92 | 0.40813 myex2 on asm | 1.66/2.63 blzpack -b1000m 7.31/8.14 | 3.81/4.70 | 0.38093 brotli_1.07_win64 -0 3.47/3.83 | 3.02/3.42 | 0.37384 lz4_1.9.2_win64 1.91/2.34 | 0.30/0.81 | 0.50933 lz4_1.9.2_win64 -3 11.73/11.968 | | 0.3886 lizard_1.0_win64 -22 7.61/8.14 | 0.56/1.14 | 0.4244 lizard_1.0_win64 -30 3.17/3.73 | 1.06/1.44 | 0.42084 zstd_1.4.4_win64 -1 3.14/3.42 | 1.05/1.52 | 0.35752