Where can I find a good C++ implementation of a binary range coder? What I need basically, is some API that I can use to encode a stream of bits with varying probabilities. For example, let RC be an instance of a range coder, I would like to have something like this:

You can use class Encoder and Decoder from libzpaq. They are public domain. The end of the compressed stream is marked by 4 consecutive 0 bytes, which can never appear elsewhere.

How does the encoder make sure that it never outputs 4 consecutive 0 bytes? Do you add an additional 1 bit after 31 consecutive 0s?

Matt could probably answer this best, but it appears the encoder and decoder check for a low value of 0 after shifting encoded bytes and add one to the low value when it is zero.

For instance, here is the encoder code with a comment at the end about not encoding four 0 bytes in a row:

// compress bit y having probability p/64K
void Encoder::encode(int y, int p) {
assert(out);
assert(p>=0 && p<65536);
assert(y==0 || y==1);
assert(high>low && low>0);
U32 mid=low+U32(((high-low)*U64(U32(p)))>>16); // split range
assert(high>mid && mid>=low);
if (y) high=mid; else low=mid+1; // pick half
while ((high^low)<0x1000000) { // write identical leading bytes
out->put(high>>24); // same as low>>24
high=high<<8|255;
low=low<<8;
low+=(low==0); // so we don't code 4 0 bytes in a row
}
}

btw, replacing "while" in this code with 4 consecutive "if" may improve CPU branch prediction

I assume the "if"s should be nested. Does this help when the probability of the while condition being true is significantly less than 50% for any iteration? I sometimes have code more like this to try to avoid one jump but am not sure it actually helps or is worthwhile:

// compress bit y having probability p/64K
void Encoder::encode(int y, int p) {
assert(out);
assert(p>=0 && p<65536);
assert(y==0 || y==1);
assert(high>low && low>0);
U32 mid=low+U32(((high-low)*U64(U32(p)))>>16); // split range
assert(high>mid && mid>=low);
if (y) high=mid; else low=mid+1; // pick half
if ((high^low)<0x1000000) { // write identical leading bytes
do {
out->put(high>>24); // same as low>>24
high=high<<8|255;
low=low<<8;
low+=(low==0); // so we don't code 4 0 bytes in a row
} while ((high^low)<0x1000000);
}
}

the benefit of "if" is separation of code with different probability. if it is much less than 50% for every "while" iteration, it should make no difference

20+ years ago compilers generated better code for "do while" rather than "while" so replacing it with "if {do while}" can made some snse. i believe that modern compilers are much smarter but not checked it

Last edited by Bulat Ziganshin; 7th September 2015 at 03:11.

For completeness, decoding step of uABS (near Matt's fpaqc):

uint64_t xp = (uint64_t)x * p;
out = ((xp & 0xffff) + p) >> L_pb;
xp >>= L_pb;
x = out ? xp : x - xp;
if (x < (1 << 16)) { x = (x << 16) | *ptr++; } //renormalization

of rABS:

uint16_t xfrac = x & 0xffff;
uint32_t x_new = p * (x >> 16);
if (xfrac < p) { x = x_new + xfrac; out = 0;}
else { x -= x_new + p; out = 1; }
if (x < (1 << 16)) { x = (x << 16) | *ptr++; } //renormalization

of tABS:

t = decodingTable[probability][x];
out = t.symbol ^ AH; // AH - probability above 1/2
x = t.newX | readBits(t.nbBits);