Classical Dynamic Markov Coding can be seen as a binary Prediction by Partial Matching. This is because at any given time you hold reference (call it cursor here) to one state. This state contains probability of next bit being 0 (or 1 in alternative implementations) and also pointers to next states. One cursor * one prediction pointed per cursor = 1 prediction per bit. If we change it and start a new cursor at each input byte boundary we will have multiple cursors traversing the DMC binary tree during modelling. Multiple cursors * one prediction pointed per cursor = multiple predictions per bit. After that you do Context Mixing on them. What do you think? Has anyone tried that?