1. Obviously it only makes sense to mix the same kind of probabilities.

In this case, the LZP model gives a probability for one specific symbol,

and (I suppose) there's a literal model which provides predictions for all of them.

So the LZP model prediction for symbol A can be only mixed with literal

probability of symbol A.

And, well, its actually simple if we'd ignore the speed optimization.

PA - LZP model's probability of symbol 'A' occuring (match flag prob.)

p[] - literal model's distribution. p[A] = probability of same symbol A.

p = mix(PA,pA) -- new refined probability of symbol A

q[] - mixed distribution, used for actual coding

Code:

for( i=0; i<CNUM; i++ ) q[i] = (i!=A) ? p[i]/(1-p[A])*(1-p) : p;

2. Having multiple LZP models in such a case certainly makes it more

complicated, but there's nothing really special.

Basically, we can do the renormalization like above twice for

two symbols given by LZP models.

Or, maybe, even add an extra handler for a case when A==B - mix

3 predictions there (p[A],PA,PB) and only do renormalization once.

3. Now, let's consider practical implementations

a) The approach described above actually can be directly applied

if we have p[] in the form of a binary tree

Code:

// c = actual symbol
for( i=0,ctx=1; i<8; i++ ) {
if( ctx!=((256+A)>>(8-i)) ) break; // bit prefix mismatch
bit = (c>>(7-i))&1;
bitA = (A>>(7-i))&1;
pr = mix(p[ctx],bitA?1-PA/pp:PA/pp);
bit = rc_Process( pr, bit ); // rc encode/decode
pp *= bit ? (1-pr) : pr;
ctx += ctx + bit;
}
for(; i<8; i++ ) {
bit = (c>>(7-i))&1;
bit = rc_Process( pr, p[ctx] ); // rc encode/decode
ctx += ctx + bit;
}

b) Its possible to encode the c==A flag separately

(with a mixed probability), followed by masked literal

coding similar to the loops above.

Well, like lzma does.

c) We can make a LZP model which would output 8 flags per

byte (byte prefix match flags) and estimate their probabilities.

In such a case it would turn into plain binary mixing.

d) Like ppmd, we can use symbol ranking and unary coding.

In such a case, its really simple to fix up the probabilities

of any specific symbols, but unfortunately the "worst case"

behavior of unary schemes is really bad.