Hey all, I took a 1 + 1/2 year break from doing compression stuff but I recently started messing around with CM + LZP. Attached is MCM 64 bit.
The current version has 2 modes, -fast and -high.
-fast is 6 context CM + LZP
-high is 8 context CM + LZP (default -high)
The LZP is very basic, it just recycles the same bit processing as the normal CM. It uses 256 + expected char for the order 0 xor context instead of the normal 8 bit order o0.
If the match model has an expected char and the bit is a non match, it uses this info through sse to help reduce non match char costs. When there is no match model match, it just does a normal byte without processing an LZP bit.
Speed should be similar to mcm03, or a bit slower (for now). I wonder how ZCM has so much speed?
Changes:
Random tweaking, added SSE for helping the LZP.
Added -10 and -11 for more memory which uses around 2.8GB / 5.5GB (only on 64 bit versions).
Huffman disabled.
E8E9 filter (always on lol, works OK for most files unless lots of false positives).
64 bit support
linux support (./make.bat on ubuntu should compile).
Sample results with -9:
sfc: 10,382,334
enwik9: 160,663,427
enwik8: 19,505,098
silesia: 41,025,259
Source code is updated on my github:
https://github.com/mathieuchartier/mcm
I think there are probably more ways to improve the LZP by having less models / more specialized models for the LZP predictor bit.