Now I get it : this would avoid the problem of high-order models that have not collected any statistics yet, and are initialized with a probability of 1/2. Thanks for the tip! Sorry if I seem a bit slow of the mark, I'm a bit confused with the terminology sometimes.What I propose (or better: what I have seen in LPAQ IIRC) is to have 5 mixers. Use first one if in step 1 you collected only 1 prediction. Use second on if in step 1 you collected 2 predictions. And so on, finally use fifth one if in step 1 you collected 5 predictions (ie all models had available predictions).
A match model could be a good solution to discard long matches. I have to study this.