1. SSE is useless for mixture of different predictions because it was not designed for this purpose. Probability interpolation in SSE is also nearly useless because compression gain is negligible and execution time grows at order of magnitude. Thus, We need modeler with following properties:
a) possibility to attach new auxilary submodel without affecting to old ones;
b) weak dependency on input probability;

2. Suppose, We have some context model that generates probability estimation for symbol $Prob_i^0$, $i$ is count of encoded symbols. This estimation is input for SSE-modeler that generates estimation $Prob_i^1 = frac {Nom_i} {DeNom_i}$. Suppose, our auxilary submodel gives small correction to probability estimation $Prob_i^1$, for simplicity of calculations, We can choose this correction as $Prob_i = \frac {Nom_i + \varepsilon Nom_i} {DeNom_i + \varepsilon Nom_i}$. By weak definition of probability, $\sum_i Nom_i = \sum_i \delta_i DeNom_i$, $\delta_i = 1$ - symbol is encoded on $i$th step, $\delta_i = 0$ - coding failure. From this equation, We get value of $\varepsilon$:
$$\varepsilon = \frac {\sum_i \delta_i DeNom_i - Nom_i} {\sum_i Nom_i - \delta_i Nom_i}$$
Also, good choice for probability correction is $Prob_i = \frac {Nom_i} {DeNom_i + \varepsilon (DeNom_i-Nom_i)}$.