# Thread: State transitions in PPMd

1. ## State transitions in PPMd

Is there any clear explaination of state transitions in PPMd? I've read the sources (ppmdrevs) and the paper (PPM: one step to practicality), and it's still not clear.

An example, with N = 2:
abcdefghabc

At the second ab state, if I follow a successor chain, how does PPMd recognize that the second ab is the same state as the first ab (N = 2)? I can use a trie or hash table for the same purpose, but I would prefer to understand this method.

2. If you know how on-line suffix tree construction algorithm works then you know what is suffix pointer - for the record: a pointer to a node representing currently represented substring without the first symbol. IIRC PPMd uses trees that also have suffix pointers, so you can move from order(N) context to order(N - 1) context in O(1) time. Other difference between PPMd and suffix trees is that in PPMd the inner paths are not compressed - it allows for simplifications and space savings at lower model orders, but it hurts at high model orders (e.g. > 10).

PPMd always tries to code symbol at the highest possible order context. When it fails (it doesn't find the symbol in that context) it uses suffix pointer to move to lower order context and retries. It also keeps track of symbols that were present in previous trials, so it removes them from current context to remove possible redundancy.

Because of the highest possible order context approach it isn't always true that after each 'ab' PPMd will use the same context - if the maximum context order is higher than 2 then it can use different context that also ends with 'ab'.

3. Thanks for the explanation, although I'm still not clear on transitions. I understand most other topics in standard PPM (update exclusion, escape history, etc), it's only the state transitions in PPMd that I don't understand.

I'm assuming in my example than the maximum N = 2 (fixed length). Is there any clear description of the structure of how this is done, or any example with the simple string I wrote above?

4. If maximum N = 2 then PPMd doesn't create nodes for longer contexts so after coding symbol at order 2 it follows the suffix pointer to get to order 1 and looks for the recently processed symbol and possibly goes to order 2 again.

abcdefghabc and N = 2

The first symbols ie abcdefgh will be coded at order(-1) I think (ie fixed probabilities).
The second 'a' should be processed at order(0).
After processing the second 'a' PPMd will go to the leaf that corresponds to the first 'a'.
After that PPMd should code 'b' at order(1) with 'a' as an context.
Similarly for 'c'.

Not that leaves have a pointer to buffer (IIRC) so that PPMd doesn't need to have all leaf nodes at depth N. It's a form of compression of 'terminal' paths, ie those that don't branch anymore.
In other words - PPMd only creates nodes when there is some branching at the node or descendant nodes.
Otherwise it uses that pointer to buffer in leaf node and pretends there are single-symbol contexts.

If you're inclined then you can look how Ukkonnen's online suffix tree construction algorithms works. It's tricky to understand but it's not complex in itself (ie there's not a lot of different steps). You can then view the PPMd state transitions like trasitions in Ukkonen's algorithm but change two things - decompress internal (ie non-leaf) paths (decompress only on splitting a leaf node) and limit the effective depth of tree.