# Thread: Pure LZ compression with simple hash: what's a best ratio?

1. ## Pure LZ compression with simple hash: what's a best ratio?

I'm working on pure LZ compression by my ideas, and I achieve great results. So I want to know: what's in this case best compression ratio of both enwik8 and silesia.tar? Let hash table has 512K cells. Thanks for your informative answers. ​ (Sorry, I studied German...)

2. Originally Posted by lz77
I'm working on pure LZ compression by my ideas, and I achieve great results. So I want to know: what's in this case best compression ratio of both enwik8 and silesia.tar? Let hash table has 512K cells. Thanks for your informative answers. ​ (Sorry, I studied German...)
I developed simple command line pure LZ77 compressors too in 2000s (i.e., lz772/lz773, lzuf2, lzuf5). They're 90's C style code and my hash function is also simple. The compression programs must look original as they were intended for a beginner's book which became my data compression guide.

Matt Mahoney's Large Text Compression Benchmark is of course where we can see first the best comp. ratio for enwik8.

-- Gerald

3. Originally Posted by lz77
I'm working on pure LZ compression by my ideas, and I achieve great results. So I want to know: what's in this case best compression ratio of both enwik8 and silesia.tar? Let hash table has 512K cells. Thanks for your informative answers. ​ (Sorry, I studied German...)
It is a non-convex likely np-hard problem to find best lz if an entropy coding is used for literals and lz symbols. For non-trivial examples it is likely too expensive to find the answer.

If a static (including uniform, i.e. no entropy coding) entropy code is used, it becomes relatively cheap to find the best lz by a shortest path algorithm.

4. I know about NP-hard problem, but I mean an existing program for compression ot the fly similar to LZ4 without hash chain.
My result for today is 41.394% for enwik8, and 37.886% for silesia.tar. Lizard shows ~46.2% for enwik8 but compresses about 7 times slower than mine.

My result with two hashes is similar to brotli, execution time is also similar. Time for uncompress is of course better than brotli. But I need to rewrite my program in assembler under Win 64...

5. Originally Posted by lz77
I'm working on pure LZ compression by my ideas, and I achieve great results. So I want to know: what's in this case best compression ratio of both enwik8 and silesia.tar? Let hash table has 512K cells. Thanks for your informative answers. ​ (Sorry, I studied German...)
This question apparently lacks some key detail: "bitstream format". Or, at least, class of bitstreams it targets, etc. I'll assume that "pure LZ" means "no entropy coding phase" or so.

In following list, algos are arranged by "fastest/easiest decompression, worst ratio" at top and bottom is opposite of that.
On very vague level it can be classified like this:
- Byte aligned things, targeting decompression speed at any cost. Good example would be LZ4. Due to nature of its bitstream it can't generally put show when it comes to ratios, no matter what. Even perfect matchfinding wouldn't make it up for bitstream limitations. However, it FAST to decompress. In case of LZ4 it even a case in pure C. Without intrinsics, ASM, etc. That's one of things that make it so interesting.
- Improved version of that, e.g. boasting repmatch, smarted numbers encoding and so on. Unless I got it wrong, something like LZSA1 would be in this spirit. Could be slightly slower/complicated in some cases (or not) and offers somewhat improved ratio. However, inability to go below byte in granularity of various things (for the sake of decompression speed) implies some tradeoffs that hurt ratio.
- Some hybrid things, e.g. aligning on. nibbles level rather than byte level, using "range of numbers" (trick described by cbloom) and so on. The point of such designs is that it faster than "naive" bit aligned things, yet can boast considerably better ratios compared to byte-aligned things as they don't have to waste bits in a reckless manner byte-aligned things should to do, for the sake of alignment and speed. Eventually it results in some rather interesting designs boating both good decompression speed, yet considerably improved ratio. Out of opensource things there're some funny things like LZSA2 and so on that sport some ideas like that. Boasting fairly good ratios while still not being terribly hard and slow to decompress.
- Bit-level things. They are far slower to decompress: many CPUs aren't that great when it comes to dealing with single bits, extra operations for bit level I/O hurt decompression considerably. However, ability to transmit values using as many bits as needed to describe them obviously good for improving compression ratio. So, say, LZ4 and Crush are "pure LZ" (in sense they lack entropy coding phase). But they are quite different in ratios vs decompression speeds, and naively comparing them isn't excatly best idea ever.
- Heavyweight freighters, boasting full-fledged entropy coder. Obviously they reach impressive ratios. Best designs even managed to become quite fast.

This said there was funny discussion on what to consider "pure LZ" and so on. Say some tricks used by schemes without full fledged entropy coder eventually start to do tricks that aren't that far from entropy coders, at which point at least someone questioned statement if e.g. LZOMA is "pure LZ" or not. Still, your question lacks this key decision point and e.g. comparison to closest neighbors in same league, no?

6. ## Thanks (2):

compgt (8th February 2020),introspec (8th February 2020)

On linux, it can also report the memory (heap+stack) usage like in this benchmark Static/Dynamic web content compression benchmark

You can also look at the In Memory Compression Benchmark with enwik8 and silesia.

Compression/Decompression speed is highly depedending on the implementation.
Some codecs without entropy coding like lzoma are useless, because they are slower at decompression than other codecs with entropy coding.

8. What no use for one, could be of use for others. World isn't as simple as some ppl can think.

LZOMA is nearly the only opensource thing I know that boasts "simple decompressor" that "takes no memory" and looks viable for "no runtime/OS" places, etc (to extent Introspec considered it on speccy). Yet it uses rather advanced tricks to get nice ratios. That makes it interesting to try on microcontrollers and other strange places. The only comparable thing I can imagine is probably LZSA2, though it obsessed on 8-bit a bit too much. Furthermore, when you refer to speed you refer huge x86 monsters, with huge caches, out of order execution and so on. But tradeoffs could change if it isn't a case, like low-end "application" ARMs or microcontrollers. One of most interesting things about LZ4 is that it stays fast on nearly anything able to run code at all. Quite a difference compared to "heavyweight freighters" that make heavy assumptions about underlying system and if this does not holds, it suddenly far less exciting. Not to mention I wouldn't ever dare to try something like these on, say, microcontrollers.

Measuring heap usage is cool. How about system where I have no heap and only limited stack, and no malloc() unless I code it myself? It makes it a bit similar to introspec's kind of fun - with exception I prefer "modern" HW and tend to prefer more pragmatic uses than "demos". Or imagine we have to count both compressed data + decompressor size (because it's total firmware size that matters?) and I don't have 10 megs data to make it up for huge decompressor, rather like 10-20 kb. So huge complicated decompressor could negate savings and increase requirements on RAM, etc (RAM is at premium in case of MCUs, for example). Sure, I can "compress" whole wikipedia in a very efficient manner by merely setting up local copy of it and only transferring URL of article. But somehow it implies catch: to decompress Wikipedia from URL and not rely on external resources you have to download whole wikipedia (you dont know beforehand what article it would be). At which point compression is extremely efficient (size of URL vs whole article) but "decompressor size" is an issue. Maybe that's why some contests compare (decompressor + data) sizes .

p.s. I eventually gave a try to TurboBench, somehow it proven to be rather dodgy experience for me... it either failed to build or lacked half of codecs or misbehaved. It also attempts to download half of the internet. As concrete example, /pysap/docs about 10 megs of something totally unrelated to compression or benchmarks. And all that have to be downloaded to even try to compile it. I find such a reckless use of disk space and bandwidth strange for compression related things. But I wouldn't deny I could eventually have rather strange views and preferences.

9. As usual, my assumptions are based solely on numbers and not on any speculations or preferences.
You need probably special compressors or a very small window size, in memory constrained environments.
As you can see in this benchmark, lzoma is using more memory than other entropy coding based lz.
It is also a lot slower and is compressing worse than other ec based codecs.

TurboBench Compression Benchmark : - Mon Feb 10 12:06:22 2020

File cp.html from canterbury corpus - (bold = pareto)[/B] MB=1.000.0000

Code:
```      C Size  ratio%     C MB/s     D MB/s   Name                           Peak Memory c/d                Stack c/d
6895    28.0       0.83     292.89   brotli 11                    2,425,912           78,776    341,000    324,176
7702    31.3       7.33     572.16   libdeflate 12                8,970,224            7,528    325,856    324,344
7703    31.3       0.40     341.71   zopfli                       2,035,552            7,160    354,520    324,264
7758    31.5      39.55     384.42   brotli 5                     1,502,840           60,824    334,080    324,104
8008    32.5      89.79     559.16   libdeflate 6                 1,101,016            7,528    325,792    324,344
8316    33.8       0.41     208.50   lzoma 9                 28,722,610,112    1,140,858,848    589,972    324,216
8317    33.8       0.62     253.64   lzoma 7                    486,555,584       17,825,808    589,972    324,248
8408    34.2       2.57     251.05   lzoma 1                      3,818,440          139,280    589,972    324,248
10086    41.0       0.55    1366.83   lzsa 9                      71,103,392                0    326,612    323,912
24603   100.0    6016.56    6091.32   memcpy```

In conclusion lzoma is actually useless
- It's experimental
- Not maintained since 4 years
- It is not a library and difficult to rewrite as a library, because it using nearly exclusively global variables
- Compression is very slow for large files.
- Decompression speed is slower than compressors that use entropy coding
- memory allocated is very high

I'm not saying, it can't be improved, but in my experience it is better to use lightweight entropy coding like huffman coding instead of pure bitio coding.

Measuring heap usage is cool. How about system where I have no heap and only limited stack, and no malloc() unless I code it myself?
TurboBench is the only compressor benchmark reporting memory usage. Stack usage is also reported.
TurboBench is a general tool mainly for popular cpu's and is not claiming satisfying all the wishes.

I eventually gave a try to TurboBench, somehow it proven to be rather dodgy experience for me... it either failed to build or lacked half of codecs or misbehaved.
TurboBench is very simple to use. Just clone the repository as explained in the README and type make. There is no cmake or other hassless builds tools.
You can see the travis CI build, it is compiling on Intel/AMD, ARM, PowerPC, s390x architecture, on windows, linux, MacOS OS and with gcc+clang
Thank to the submodule architecture, it is easy to update all the codecs to the latest version.

It also attempts to download half of the internet. As concrete example, /pysap/docs about 10 megs of something totally unrelated to compression or benchmarks
Downloading and building is taking ~2min on AMD arch (see the travis CI). This is less time you spend commenting on encode.su.
Pysap is containing a notable compression software from SAP, that's used by half of the planet in business companies.