It's clearly linear if you abstract modeling from mixing, although I prefer the word linear regression.

The models probabilities are formed by occurence counters, but there could be many other solutions to this problem, like linear counters or what else.

I know the following is clear for everyone.

Let p=(1-w)*p1+w*p2 be the convex linear combination of two models.

Cost when retrieving a one-bit is -log(p).

A simple solution to this minimization-problem is gradient descent.

So taking partial derivative with respect to w gives p'=-(p2-p1)/p.

And a step for the parameter vector would be w:=w-L*p when retrieving a one-bit, where L is the step-size on the gradient line.

But for numerous reasons linear regression is bad for this problem, mainly because our dependent variables p1 and p2 are binary.

So logistic regression is a better answer.

Where p=1/(1+exp(-(s1*w1+s2*w2))). Taking the partial derivatives again, gives

w1:=w1+L*s1*(1-p) and w2:=w2+L*s2*(1-p) when retrieving a one bit, or simply w:=w-L*(s2-s1)*(1-p) if we use only one weight as in linear regression.

The only problem here is the interpretation of the weights.

But you can come to this solution from different initial situations.

So my mixer looks like this

Code:

#define MIXW2 // use two weights?
class Mixer2f {
const int wbits;
const int wscale;
int w1,w2,s1,s2,pm;
public:
Mixer2f():wbits(18),wscale(1<<wbits),w1(wscale>>1),w2(wscale>>1) { };
int idiv(const int v,const int s)
{
return (v+(1<<(s-1)))>>s;
}
int mix(const int _p1,const int _p2)
{
s1=domain.Stretch(_p1);
s2=domain.Stretch(_p2);
#ifdef MIXW2
pm=domain.Squash(idiv(s1*w1+s2*w2,wbits));
#else
pm=domain.Squash(s1+idiv(s1+(s2-s1)*w1,wbits));
#endif
pm=cGlobal::max(1,cGlobal::min(PSCALE-1,pm));
return pm;
}
void update(const int bit,float rate)
{
int e=(bit<<PBITS)-pm;
int r=wscale*rate;
#ifdef MIXW2
int d1=idiv(e*s1,PBITS);
int d2=idiv(e*s2,PBITS);
w1+=idiv(r*d1,PBITS);
w2+=idiv(r*d2,PBITS);
#else
int d1=idiv(e*(s1-s2),PBITS);
w1-=idiv(r*d1,PBITS);
#endif
w1=cGlobal::max(-wscale,cGlobal::min(wscale,w1));
w2=cGlobal::max(-wscale,cGlobal::min(wscale,w2));
}
};

results for the above mixer on calgary corpus with my previous mentioned LZ-codec (18-bit weights).

Code:

two weights unrestricted: 843.302
two weights from [-wscale,+wscale]): 843.187
two weights from [0,+wscale]): 843.230
one weight from [-wscale,+wscale]): 844.308 (mathematical incorrect)
one weight from [0,+wscale]: 844.254