For BWT you need a fast adapting model, not a stationary one. Here are a few more experiments.

Code:

768,771 book1
768,771 book1.scott_trans
231,899 book1-m1.zpaq
250,286 book1.scott_trans-1.zpaq
252,387 book1.scott_trans.fpaq0f2
215,277 book1-m2.zpaq
237,694 book1.scott_trans-2.zpaq
277,515 book1.scott_trans.fpaq0
262,988 book1.scott_trans-cm1-255.zpaq
248,970 book1.scott_trans.fpaq0p
249,951 book1.scott_trans-cm1-8.zpaq

book1-m1.zpaq is compressed with zpaq -m1. It uses BWT+RLE+ICM0. scott_trans-1 takes out the BWT pre/post but uses the same model. The config file is:

Code:

comp 1 0 0 0 1
0 icm 5
hcomp
halt
post 0 end

fpaq0f2 also uses an indirect model like this but with a different bit history.

zpaq -m2 uses BWT+ICM0-ISSE1 (no RLE). Again I replaced the BWT with scott_trans.

Code:

comp 1 0 0 0 2
0 icm 5
1 isse 12 0
hcomp
d= 1 *d=0 hashd halt
post 0 end

It uses an order 0 ICM like before, but then adjusts the prediction using an order 1 ISSE. An ISSE is a 2 input mixer taking the ICM prediction and a fixed constant as input and the weights are selected by an order 1 bit history.

fpaq0 is a stationary order 0 model. -cm1-255.zpaq is similar:

Code:

comp 1 0 0 0 1
0 cm 9 255
hcomp
halt
post 0 end

zpaq does not have a pure stationary model but this is pretty close. A CM maps a context directly to a prediction. The context is order 0, thus HCOMP is empty. But there is an implied bitwise context expanded to 9 bits to reduce cache misses. The arguments to the CM are the context size in bits and 1/4 the maximum count. The learning rate is 1/count where count is limited to 255*4, which is the maximum supported.

fpaq0p is an adaptive direct context model with a learning rate of 1/32. This is equivalent to "CM 9 8", replacing line 0 above in cm1-8.zpaq above except that the learning rate is faster until the count reaches 32.