Well, if you dont like that, then what about parametric coding?
That is, further generalization of an idea with storing the probabilities
on encoding (instead of immediately generating the code) and directly
copying the data if model failed to compress it, or just encoding it
faster as simple encoding loop allows that.
Like that its implemented in
http://shelwien.googlepages.com/order_test4.rar
And the next idea is to divide the stored probability into even more parts.
For simple example, if you have two statically mixed submodels
: p=p1*(1-w)+p2*w
then instead of storing values of p, you can store both p1 and p2, and
optimal value of w for the block. So we get a long-awaited asymmetric CM coder
with faster and simpler decoder.
And of course its tempting to apply that to all non-recurrent transformations
(and maybe to somewhat unroll the recurrent ones), so ideally youd just store
the used counter values and optimize all the parameters at the end of the block.