Here are some results for book1bwt with zpaq. 
Code:
(results for zpaq c(this_file) archive book1bwt)
(note: all parameters are tuned)
comp 2 0 0 0 (hh hm ph pm)
(uncomment any of below to run)
(235069 simple order 0) (1 0 cm 9 3)
(237916 simple order 1) (2 0 const 0 1 cm 17 5)
(220811 static mixer) (3 0 cm 9 3 1 cm 17 5 2 avg 0 1 120)
(219581 adaptive mixer) (3 0 cm 9 3 1 cm 17 5 2 mix 0 0 2 14 0)
(221070 indirect order 0) (1 0 icm 4)
(231883 indirect order 1) (2 0 const 0 1 icm 12)
(215055 indirect chained) (2 0 icm 4 1 isse 12 0)
(214936 ind static mixer) (3 0 icm 4 1 icm 12 2 avg 0 1 160)
(214514 ind adapt mixer) (3 0 icm 4 1 icm 12 2 mix 0 0 2 14 0)
(214772 chained & mixed) (3 0 icm 4 1 isse 12 0 2 mix 0 0 2 14 0)
hcomp
d= 1 a<<= 9 *d=a halt (set order 1 context for model 1)
post
0
end
Note the const models don't do anything. I put them there so I didn't have to rewrite the hcomp code.
To keep it simple, contexts are direct lookups (or equivalently no hash collisions) and there is no mixer context. I tuned static mixer weights (avg), adaptive mixer learning rates (mix) and simple context model minimum adaption rate (cm) for max compression.
I could reduce sizes by 20 bytes by using b command instead of c (to omit SHA1 checksum).
As you might know, zpaq does all mixing (static or adaptive) in the logistic domain (stretch(p) = log(p/(1-p)) with a final squash at the end (squash(p) = 1/(1+exp(-p))).
Direct models (cm) map a context to a prediction and use a variable adaption rate that starts fast and depends on a count. Indirect models (icm) map a context to a bit history and then to a prediction which is adapted at a fixed rate.
Chaining (isse) maps a context to a bit history which selects a mixer context for the previous model and a fixed constant.
http://cs.fit.edu/~mmahoney/compression/#zpaq