> Trying to compress the different raw LZMA codes
> (dist,len,literals...) with different algorithms,
I ended up with 4 distinct streams there: id (0..6), literal,
len (0..271), dist (0..2^32-1).
Besides simple literals and matches there're also rep0..3 and rep0long ids.
> if I understand correctly - has the big advantage to be well-defined.
But dumping them and "compressing with different algorithms" (ie external compressors)
doesn't really work at all (except for literals) - only paq8 handles numbers better
than unaligned sequences of bytes, and even paq8 is pretty bad at it.
> Looks promising indeed.
Not quite:
Code:
icl.exe SFC enwik8
996740 12172597 24560059 (original lzma)
991153 12076928 24496752 (lit model with SSE)
972802 12051193 24502170 15.12.2010 (logistic mix-3 in lit model)
969626 12020481 24497848 20.12.2010 (+len model SSE)
(1-969626/996740)*100 = 2.72026807392098
(1-12020481/12172597)*100 = 1.24965937835616
(1-24497848/24560059)*100 = 0.25330150876266
For now I replaced the submodels for literals and ids,
and that's what I've got (with SFC tuning).
However, the important point is that all these streams
were produced with parsing optimization for original lzma codes.
Actually I'm not aiming for recompression, just trying to
design a LZ method with better compression than lzma,
and recompression is accidentally a reliable way to do that 
> Have you tried similar things on zLib streams?
> You'd get much better compression there.
Certainly, because of rc vs huffman (while lzma already uses
the same rangecoder and counters which I do).
But deflate also has a very small window, so its recompression
can't be that simple.
> The downside is it only sees the raw codes, not the decompressed data, right?
No. In fact, its impossible to decode the lzma literals without having full decoded data,
because of literal masking after matches.
> But f.e. Precomp recognizes a PDF stream inside the decompressed XZ
> stream which can be optimized:
I just got an idea about this. What about _partial_ recompression?
ie you decode the lzma stream, find the PDF in it, and recompress
only the part of lzma stream which contains the PDF?
P.S. Any comments about dllmerge?