Let me introduce the initial release of my brand new LZP-based file compressor called SQUID! It was designed for modern CPUs (6+ MB of L3 cache) and with reasonable compression/decompression speed in mind. Honestly, it could be called LZPX2 - it is very similar to the original LZPX with some major improvements.
At the beginning it was a testbed for my new fast order-1 coder (a part of unreleased PPMX version), but later I decided to release it as a simple LZP - since I liked the results.
So, there is a three compression levels:
c0 - Order-1 coder only (Disable string matching)
c1 - LZP with Greedy Parsing
c2 - LZP with Flexible Parsing
Decoder is the same for all compression levels.
Had a bit of free time, so tested it a bit more, I think decompression can be fastened by reading into buffer and same goes for writing compression output...
Code:
Compressing Abyss.bsm_tmp:
114647933 -> 67046526 in 13.15s
Process Time : 13.109s
Clock Time : 13.187s
Working Set : 71 MB
Pagefile : 69 MB
IO Read : 109 MB (in 4 reads)
IO Write : 63 MB (in 16373 writes)
Decompressing Abyss.bsm_tmp.c0:
67046526 -> 114647933 in 15.46s
Process Time : 14.843s
Clock Time : 15.509s
Working Set : 71 MB
Pagefile : 69 MB
IO Read : 63 MB (in 16369 reads)
IO Write : 109 MB (in 7 writes)
Compressing Abyss.bsm_tmp:
114647933 -> 46416205 in 12.31s
Process Time : 12.312s
Clock Time : 12.337s
Working Set : 71 MB
Pagefile : 69 MB
IO Read : 109 MB (in 4 reads)
IO Write : 44 MB (in 11337 writes)
Decompressing Abyss.bsm_tmp.c1:
46416205 -> 114647933 in 9.80s
Process Time : 9.124s
Clock Time : 9.871s
Working Set : 71 MB
Pagefile : 69 MB
IO Read : 44 MB (in 11333 reads)
IO Write : 109 MB (in 7 writes)
Compressing Abyss.bsm_tmp:
114647933 -> 45796296 in 16.86s
Process Time : 16.842s
Clock Time : 16.891s
Working Set : 71 MB
Pagefile : 69 MB
IO Read : 109 MB (in 4 reads)
IO Write : 43 MB (in 11185 writes)
Decompressing Abyss.bsm_tmp.c2:
45796296 -> 114647933 in 10.41s
Process Time : 9.936s
Clock Time : 10.430s
Working Set : 71 MB
Pagefile : 69 MB
IO Read : 43 MB (in 11181 reads)
IO Write : 109 MB (in 7 writes)
Press any key to continue . . .
Implemented all LZ variants - LZP/ROLZ/LZRW2/LZ77 within the same literal/match coder (including decoders).
The LZP does its job, especially in terms of compression speed vs. compression ratio, at the cost of the decompression speed though. Mainly the decompression speed suffers because we code much smaller count of matches, compared to LZ77, as example. i.e. it's mainly not because of mandatory hash table - LZRW2 has to maintain the same hash table as well, but its decompression speed is notable faster compared to LZP - since it finds more matches - and with a match we quickly decode a few bytes at once. I'm talking about LZs with arithmetic backend here.
Anyway, I'd like to keep SQUID as an LZP-based coder. I've made a few major improvements to it - improved literal coding (at last now I handle literal-after-match case) and improved parsing.
Although, I have some problems with LZP's Optimal Parsing. First of all, I have found that adding strings to the dictionary at literal/match bounds is mandatory - since adding strings at each input byte will ruin both compression ratio and speed. This limits parsing to Flexible Parsing only, I presume. And now I have a few variants - the one that provides maximum count of matches (best on finn_lst/english.dic) and the one with more literals and longer matches (better for ENWIKs, but much worse on finn_lst/english.dic). I'm looking for a better solution.
In addition, just as a classic PPM, LZP suffers from the same optimal model order problem. For some files an order-3 hash table is optimal, for another an order-4 or order-5. With the SQUID v0.01, I use an order-4 -> order-2 hash table - i.e. if order-4 context is not confirmed (hash collision) we use order-2. I have obtained nice results with order-8 -> order-4 -> order-2 hash tables, at the cost of compression and, sadly snd unacceptably, the decompression speed drop.
For a single model order I have found that order-3 is good enough - it is fast for both compression and decompression and with my current parsing provides 286,xxx,xxx on ENWIK9 (speed is faster than current SQUID v0.01 c1)
Nope. It could be even slower. As example, 32 MB buffer is notable faster than 64 MB on my SSD. But 32 MB is too small IMO - since LZP can act as a dedupe (like SREP) - it is able to find matches at any distance within a buffer (with a large enough hash table).
As I already mentioned, I've implemented ALL LZ variants within the same literal/match encoder. (LZ77, LZRW, ROLZ, LZP) And I was unable to beat my LZP in terms of compression speed vs ratio! And like I said earlier, these all comes at the cost of the decompression speed (LZ77 at the very least 2X faster at the decompression)
if you had implemented al variants, why not include them all? it's interesting to compare, as well as may be useful for different tyopes of data. And of course OSS with liberal license will be even better - it may serve as learning bench for LZ* family.
imho, BCM already in this pack - it's small, comprehensible example of good compression with BWT+CM
The lack of spare time + I'm too picky regarding my software.
I confess, I had an idea to add ROLZ to SQUID - making it a hybrid - LZP for fast compression and ROLZ for a higher/slower compression modes.
As to OSS:
LZ4X / CRUSH = LZ77
QUAD / BALZ = ROLZ
LZPX / LZPXJ = LZP
Yep, these a bit old, but the main experiment with SQUID is a PPMd-type order-1 coder (byte-oriented, with frequency sorting and other tricks). The bit-oriented coder a la the one used in the BALZ is better in terms of compression, but it is also notable slower...
i think that my name was recognized a bit higher once i published Tornado - a large package covering wide range of compression techniques, rather that multiple independent small projects
Indeed, you have developed almost every type of algo possible. But may be combining them into single pack would drive "sales"? One thing about your algos is that they all tend to be very small, thus been an ideal teaching basis.
SQUID's decompression speed is 2 times slower than that of lzma. If you want a decent decompression speed for practical purposes, you simply can't do certain things in the decoder. For example, you cannot hash the context for each position and update the ROLZ table, like in the older zlite compressor by RichSelian - that's already slower than lzma.
Christian Martelock got it about right with RAZOR. It's a bit too slow on compression, but otherwise it beats lzma.
I tried my own ROLZ encoder, and it works! Making it work was hard thought, not experienced in this kind of stuff...
So I cannot confirm that updating the ROLZ table only at literal/match boundary improves compression. At least with the naive greedy parsing that I'm using, updating the table for each byte slightly improves compression. But it's much slower at decompression. The next best thing, I thought, was to put just one position in the middle of the match. So I made a few experiments, and it seems that putting at the second to last byte of each match works best. It's cheap (can be executed out of order) and it reclaims more than half of the difference. Here's the actual decoder: