Here is an example. I wrote 2 identical zpaq models that mix bitwise order 8..16 contexts, except that one includes a pre/post-processor that reverses the bit order of each byte. zpaq normally models bits MSB first. msb.cfg contexts consist of the current byte and the low bits of the previous byte. It does not use any pre/post-processing. lsb.cfg recursively calls itself to preprocess because pre- and post- are identical (which is not always the case). This trick requires zpaq v6.xx (currently v6.06) because older versions have a different command line syntax.
Some results:
lsb: maxcomp.zip 14952164 -> 14903145
msb: maxcomp.zip 14952164 -> 14910108
lsb: a10.jpg 842468 -> 827863
msb: a10.jpg 842468 -> 826429
maxcomp.zip is the maximum compression corpus created with zip -9. a10.jpg is from that data set. Note that zip compresses better LSB first, and jpeg MSB first, because they store Huffman codes in that order.
To test, paste the 2 config files below to msb.cfg and lsb.cfg, then run, e.g.
zpaq -add archive maxcomp.zip -method lsb -solid
-solid mode adds to archive.zpaq without deduplication or incremental update. Otherwise if you add a second time it will detect that it is already stored and you'll get a compressed size of 0. If you want an accurate compressed size then delete the archive each time or you will get an archive with multiple copies.
Code:
(msb.cfg)
comp 0 0 0 0 10
0 cm 9 $1+16
1 cm 10 $1+16
2 cm 11 $1+16
3 cm 12 $1+16
4 cm 13 $1+16
5 cm 14 $1+16
6 cm 15 $1+16
7 cm 16 $1+16
8 cm 17 $1+16
9 mix 0 0 9 $2+24 0
hcomp
a<<= 9 *d=a halt
post 0 end
(lsb.cfg)
comp 0 0 0 0 10
0 cm 9 $1+16 (order 8..16 bit context model)
1 cm 10 $1+16
2 cm 11 $1+16
3 cm 12 $1+16
4 cm 13 $1+16
5 cm 14 $1+16
6 cm 15 $1+16
7 cm 16 $1+16
8 cm 17 $1+16
9 mix 0 0 9 $2+24 0
hcomp
a<<= 9 *d=a halt
(pre- and post-processor reverse bit order)
pcomp zpaq -method lsb -quiet -run pcomp ;
c=a
a> 255 if halt endif
a&= 1 a<<= 7 b=a
a=c a&= 2 a<<= 5 a+=b b=a
a=c a&= 4 a<<= 3 a+=b b=a
a=c a&= 8 a<<= 1 a+=b b=a
a=c a&= 16 a>>= 1 a+=b b=a
a=c a&= 32 a>>= 3 a+=b b=a
a=c a&= 64 a>>= 5 a+=b b=a
a=c a&= 128 a>>= 7 a+=b
out
halt
end
You can pass 1 or 2 arguments to either config file to change the rate of adaptation of the context models and mixers like this:
zpaq -add archive maxcomp.zip -method lsb -10 20 -solid
I just used the defaults which work pretty well, but you might get better compression.
A CM maps a context to a bit prediction and a count, then updates the prediction by error/count. It takes two arguments, which are the number of bits of context and the maximum count/4 before the rate of adaptation stops decreasing. The computed context is added to the current partial byte context, expanded to 9 bits, so what I did was take the previous byte (passed in A), shifted it left 9 bits, and stored it in H[0] (pointed to by D) as input to the model. The size of H is 1 (given by the first 0 after COMP), so all 10 components read their context from the same shared location. The size of the model chops off the high bits of the order 1 context.
A MIX averages a range of predictions by other components using weighted averaging in the logistic domain (log(p/(1-p))), then updates the weights in favor of the best models. This MIX does not use any context to select a weight array, but you can usually get better compression if it does. The 5 arguments to MIX are the number of context bits (0), first component (0), number of components (9), adaptation rate (24), and order 0 context mask (0, but try 255).
A postprocessor, if any, runs in a separate virtual machine. The postprocessor in msb.cfg receives the decoded input byte in A. It uses B and C as temporary registers to reverse the bit order and output it. The input is 0xffffffff at EOF. The if-statement ignores it. The PCOMP section calls zpaq with -run to call itself as the preprocessor. Normally this would be a separate program that takes an input and output file as its last 2 arguments.
Get zpaq at http://mattmahoney.net/dc/zpaq.html