In experimenting with CDFs and nibble encoding, I've concluded that it helps on more even distributions, but my existing modelling for extreme distributions (which I'm dealing with more frequently) still win out. I thought I'd explain how that works here (example taken from fqzcomp):

Model:

Code:

struct SIMPLE_MODEL {
enum { STEP=8 };
SIMPLE_MODEL();
inline void encodeSymbol(RangeCoder *rc, uint16_t sym);
inline int encodeNearSymbol(RangeCoder *rc, uint16_t sym, int dist);
inline uint16_t decodeSymbol(RangeCoder *rc);
protected:
void normalize();
uint32_t TotFreq; // Total frequency
// Array of Symbols approximately sorted by Freq.
struct SymFreqs {
uint8_t Symbol;
uint16_t Freq;
} sentinel, F[NSYM+1];
};

The key thing here is the symbol array F[] is preceded by sentinel. This is initialised with:

Code:

SIMPLE_MODEL<NSYM>::SIMPLE_MODEL() {
for ( int i=0; i<NSYM; i++ ) {
F[i].Symbol = i;
F[i].Freq = 1;
}
TotFreq = NSYM;
sentinel.Symbol = 0;
sentinel.Freq = MAX_FREQ; // Always first; simplifies sorting.
F[NSYM].Freq = 0; // terminates normalize() loop. See below.
}

I have no escape symbols here, so everything has a minimum frequency of 1. It's faster handling it this way, but not as optimal compression, especially on higher orders. Instead I typically ensure NSYM is a suitable value to not leave lots of unused symbols.

The symbol encoding doesn't use a CDF, but instead linearly scans. However it does so on a sorted list with the most likely symbols first, so on extreme distributions this is fast to find:

Code:

template <int NSYM>
inline void SIMPLE_MODEL<NSYM>::encodeSymbol(RangeCoder *rc, uint16_t sym) {
SymFreqs *s = F;
uint32_t AccFreq = 0;
while (s->Symbol != sym)
AccFreq += s++->Freq;
rc->Encode(AccFreq, s->Freq, TotFreq);
s->Freq += STEP;
TotFreq += STEP;
if (TotFreq > MAX_FREQ)
normalize();
/* Keep approx sorted */
if (s[0].Freq > s[-1].Freq) {
SymFreqs t = s[0];
s[0] = s[-1];
s[-1] = t;
}
}

The last part of that just swaps the current symbol with the previous symbol if it's larger. This is one single step from a bubble sort, distributed across each symbol encode. The Sentinel value we defined earlier forces this to not underrun the array (although I it's rather dodgy it being a separate variable rather than an *actual* element -1 in the array). We can also only do this swap periodically (I used to have a "bub_count" variable for this and do every 16 symbols), but it's an extra comparison and not worth it generally.

While I've been experimenting with doing SIMD encoding and decoding of 16-wide CDFs, I just can't get it to run significantly faster than the above on most of the data sets I'm working on. Maybe your mileage will vary, but it's worth considering with adaptive modelling.