With fpaq0p-like probability updates the weigths form a geometric sequence. This is generally desirable, but this poses a new problem. Consider such sequence:
0000000100000000100000
After bit 1 the probability of 1 will drastically go up and then next few 0's will be coded with much longes codes than the 0's right before next 1.
My solution is to make smoothed counters by appending a bit history to a probability, ie instead of keeping an int value describing weighted probability of the history up to current point, we have a pair of (recent bit history trimmed to last N-bits, wegithed probability of the history up to the point where the bit history from the first tuple value begins). Additionally there could be a table that maps a bit history from that pair to a pair (weight of delayed probability, probabilty computed from the trimmed bit history). When updating, the current bit will be shifted in the trimmed bit history and the bit that was shifted out of the trimmed bit history will be used to update the delayed probability.
What do you think?
PS:
Direct bit histories could be replaced with other types of states provided that a state corresponds exactly to some (trimmed or not) bit history, so we can update the delayed probability by checking which bits are shifted out from the new state.