What compressor shows highest LZ77-like compression of enwik8? What's the compression ratio? The code should not pre-analyze source data. Thanks.
What compressor shows highest LZ77-like compression of enwik8? What's the compression ratio? The code should not pre-analyze source data. Thanks.
Quark is even better than lzma compresses enwik8 to 22,988,924 but unfortunately is closed source.
CSC -m5 compresses enwik8 a bit better than LZMA (from xz-utils) at level 9. Significantly faster compression, too, though decompression is a bit slower. IIRC it's lz77 but I'm not certain.
> What is pre-analysis here?
It's a good question, but it seems, answer is simple: the unpacker should ONLY copy literals from compressed data, and copy some bytes from already uncompressed data. If an optimized unpacker uncompresses enwik8 more than 0.5 sec. on 1 core, then it's not pure LZ77 compression. In packed enwik8 at the beginning of the data should appear '<mediawiki xmlns="http://www.' and other literals.
> CSC -m5 compresses enwik8 a bit better than LZMA
Both compressors are not pure LZ77:
"Introduction for libcsc:
The whole compressor was mostly inspired by LZMA, with some ideas from others or my own mixed.
Based on LZ77 Sliding Window + Range Coder."
> Quark is even better than lzma compresses enwik8 to 22,988,924...
I can't find the Quark... I doubd than a LZ77 compressor can achieve ratio less than 32% (Mb) on enwik8.
Last edited by lz77; 12th August 2017 at 11:58.
Attached...
Also, try this:
https://encode.su/threads/2280-LZOMA...ll=1#post46015
Thanks, I tried lzoma.exe, quark.exe, crush.exe (An LZ77-based file compressor by I. Muraviev: sourceforge.net/projects/crush/), and lizard32.exe by by Y.Collet & P.Skibinski).
quark.exe drilled my brain by its interface...
I see that only lizard is LZ77 compressor, only in liz archive I've found literals like '<mediawiki xmlns="http://www.'.
lizard32.exe -29 --no-frame-crc -B6 enwik8 enwik8.liz
produces enwik8.liz of size 37 203 082 bytes. But decompression is slow, ~ 3 sec....
I hope in near future I'll compress hapless enwik8 to ~33 000 000 bytes and decompression will be very fast.![]()
Compressors like quark and plzma have high compression because of strong entropy coders.
These compressors have preference for literal coding against matches.
This is the reason why they are too slow at decompression and they loose some speed advantage over bwt
In general this is not what you are expecting from a lz77 compressor.
You can look at the Compression Benchmark or
make your own benchmark with the Compressor Benchmark TurboBench
see also: LTCB
"lzturbo -29" is compressing enwik8 to 28,788,842 without any entropy coding
Perhaps I'm misunderstanding what you want. I'm sure you already tried this:
https://encode.su/threads/550-Ultra-...ll=1#post53288