The higher order or context dependent method, instead of e.g. "tableDecode[state]", will use "tableDecode[history/context][state]" - many smaller coding tables.

Where, while encoding is in backward direction, the history/context means the one for later forward direction decoding.

The minimal header problem is indeed interesting: how to find the best description/approximation: minimizing "header size + imprecision cost".

For large data, the header cost becomes negligible (unless using some high order or complex context model) - static compressors can be better then adaptive, as they use the whole data to construct the optimal model - the income can be larger than the header cost.

ps. As I have written

in a different thread, for "reversed entropy coding" problem, we can use generalizations of Kuznetsov-Tsybakov problem to encode such that decoder doesn't have to know the probabilities used -

**no header is needed**.

I don't see how to do it for standard entropy coding, but maybe something like that is possible? The information about probability distribution seems somehow excess ...