The header of Huffman coding can be relatively small - contain just lengths of bit sequences corresponding to succeeding symbols: approximations of probabilities using powers of 1/2.

However, as the speed of accurate entropy coding of large alphabet has increased a few times in the last few months (~7? for Range coding vs FSE/tANS), there appears a problem of storing more accurate probability distributions - a tradeoff with compression rate.

So I wanted to start a discussion about how to it well and fast to optimize the total size of file: header + compressed data.

For the beginning, let us look at cost of approximation - if we use entropy coder optimal for {q_s} distribution to encode {p_s} distribution, we loose on average Kullback-Leibler difference bits/symbol:

deltaH=D(P||Q) = \sum_s p_s lg(1/q_s) - \sum_s p_s lg(1/p_s) = \sum_s p_s lg(p_s / q_s) ~ 0.72\sum_s (p_s-q_s)^2 / p_s

where the last formula is from expanding logarithm to second order around 1.

So generally the smaller probability, the more accurate it should be given - one approach can be: divide the set of possible probabilities into growing disjoint ranges, represented by (~)the central point, such that each of them have "at most" or "on average" similar impact to the sum above.

However, there is an issue with the last symbol in this approach - normalization sum_s q_s=1.

So I thought that maybe we should start with normalization(?): length 1 interval, and encode the cumulant function - positions of divisors between succeeding symbol (m-1 for size m alphabet). There is a fast way to do it nearly optimally I was told by James Dow Allen (6th page) - divide the interval into let say m buckets, use unary system to encode the number of elements in every bucket (cost: 2bits * m), then eventually encode succeeding bits of their expansion.

In our case the amounts of these bits should depend on sizes of intervals between them - for example going layer by layer, we give succeeding bit if there is a neighboring element for given approximation level, up to the maximal resolution - such that the decoder knows when to expect succeeding bit for given element:

Do you know some good algorithms for this optimal header size problem?