Okay, today I wrote a simple LZ77 based compressor. There is no super results to share, however, this compressor works!
For decompression we need only buffer size + memory needed for literal/offset models = ~1 MB total memory usage.
Two problems and differences from ROLZ:
1. We must drop the short matches at long distances
2. How to encode match position
Current solutions:
1. I make some threshold for a few matches:
if (maxlen == 3 && pos >= (1 << 12))
maxlen = 0;
if (maxlen == 4 && pos >= (1 << 14))
maxlen = 0;
if (maxlen == 5 && pos >= (1 << 16))
maxlen = 0;
if (maxlen == 6 && pos >= (1 << 1)
maxlen = 0;
Or something like that...
Note that I use 1 MB buffer. And in this experimental version I encode each buffer separetely.
2. For literals I use a bit-oriented arithmetic encoder.
Literals/Match lengths as with LZH represents one alphabet:
0...255 - literals
256...511 - match lengths from 3 to 258
So, how to encode position? Note that we have (1 << 20) or 1 MB of values. With this version I use the 20-bit alphabet - i.e. each bit of match position encoded separetely (as with symbols). I think this is a bad approach, however the idea is interesting...![]()