Hi guys, so I've been writing a compressor hybridizing lz77 and bwt. It essentially generalizes a bunch of cm derived models into bytewise equivalent filters and compression stages which are brute forced during compression but are known during decoding and has very asymmetric performance (typically 3-10mb/s and 30-50mb/s).
Source code: https://github.com/loxxous/Jampack
Exe: https://github.com/loxxous/Jampack/releases
As expected it reaches BWT levels on text but with much better performance on structured and especially embedded data (eg: videogame dlc patches) in comparison to something like bsc. But I need to write one more model to generalize the fast adapting nature of lz or paq for structured inputs like Charles Bloom's lzt24 test file.
Currently it's dedupe + adaptive channel filtering + localized prefix modeling + lz77 to catch long structured repeats + burrows wheeler and entropy coding. I thought that should be enough stages to handle most data but turns out it's still missing the lz-esque compression on static binary data.
I've already got a local prefix model to capture nearby matches bijectively and induce a more optimal sort order but the literals are the tricky part. Any suggestions on how to synthesize fast adaptive modeling while still being compressed with a borrows wheeler core?