Tried adding a better AC.
Had to use v82s, since there its possible to use SSE prediction for coding directly, without rounding to 12 bits.
Code:
paq8pxd82s paq8pxd82sa // -x7
00s 1,048,576 442 216 // 1M of zero bytes
FFs 1,048,576 443 355 // 1M of FF bytes[*]
dups 1,048,576 557 396 // repeated 100-byte string of random bytes
rand 1,048,576 1,049,202 1,049,198 // 1M chunk of some archive
BOOK1 768,771 182,660 182,646 // calgary book1 [**]
[*] different results for 00s and FFs are caused by paq's 0 turning into 1/32768 and paq's 4095 turning into 1-8/32768
[**] could be 4 bytes less, but flush after header is incompatible with rc tail cutting