Okay, today I wrote a simple LZ77 based compressor. There is no super results to share, however, this compressor works!

For decompression we need only buffer size + memory needed for literal/offset models = ~1 MB total memory usage.

Two problems and differences from ROLZ:

1. We must drop the short matches at long distances

2. How to encode match position

Current solutions:

1. I make some threshold for a few matches:

if (maxlen == 3 && pos >= (1 << 12))

maxlen = 0;

if (maxlen == 4 && pos >= (1 << 14))

maxlen = 0;

if (maxlen == 5 && pos >= (1 << 16))

maxlen = 0;

if (maxlen == 6 && pos >= (1 << 1)

maxlen = 0;

Or something like that...

Note that I use 1 MB buffer. And in this experimental version I encode each buffer separetely.

2. For literals I use a bit-oriented arithmetic encoder.

Literals/Match lengths as with LZH represents one alphabet:

0...255 - literals

256...511 - match lengths from 3 to 258

So, how to encode position? Note that we have (1 << 20) or 1 MB of values. With this version I use the 20-bit alphabet - i.e. each bit of match position encoded separetely (as with symbols). I think this is a bad approach, however the idea is interesting...