Here is implementation of the "range variant of ANS" (rANS) by Fabian "ryg" Giesen: https://github.com/rygorous/ryg_rans

Thanks to being simpler and using single multiplication per decoded symbol (8th page of http://arxiv.org/pdf/1311.2540 ), accordingly to tests on Charles Bloom blog ( http://cbloomrants.blogspot.com/ ), the current version has faster decoding than range coding at cost of slower encoding.

Algorithm: for natural f[s] (quantization of probabilities):

freq[s] = f[s] / 2^n

cumul[s] = sum_{i<s} f[s]

mask = 2^n - 1

symbol[x=0 ... n-1] = s: cumul[s-1]<= x < cumul[s] - which symbol is in x-th position in (0..01..1...) length 2^n range: having f[s] appearances of symbol s

encoding of symbol s requires 1 division modulo f[s]:

C(s,x) = (floor(x/f[s]) << n) + mod(x , f[s]) + cumul[s]

decoding from x has one multiplication and needs symbol[] table or function deciding in which subrange we are:

s = symbol [ x & mask ]

x = f[s] * (x >> n) + (x & mask) - cumul[s]

Here are benchmarks of the current version from Charles blog (arith is range coder):

huf = order-0 static Huffman

fse = Yann's implementation of tANS

rans = ryg's implementation of rANS

arith = arithmetic coder with static power of 2 cumulative frequency total and decode table

For fse, rans, and arith I use a 12-bit table (the default in fse.c)

huf uses a 10-bit table and does not limit code length

Runs are on x64 code, but the implementations are 32 bit. (no 64-bit math used)

Update: ryg's blog about rANS: http://fgiesen.wordpress.com/2014/02/02/rans-notes/Some notes on the four implementations will follow. First the raw results : inName : book1

H = 4.527

arith 12: 768,771 -> 435,378 = 4.531 bpb = 1.766 to 1

arith encode : 0.006 seconds, 69.44 b/kc, rate= 120.08 mb/s

arith decode : 0.011 seconds, 40.35 b/kc, rate= 69.77 mb/s

rans 12: 768,771 -> 435,378 = 4.531 bpb = 1.766 to 1

rans encode : 0.010 seconds, 44.04 b/kc, rate= 76.15 mb/s

rans decode : 0.006 seconds, 80.59 b/kc, rate= 139.36 mb/s

fse : 768,771 -> 435,981 = 4.537 bpb = 1.763 to 1

fse encode : 0.005 seconds, 94.51 b/kc, rate= 163.44 mb/s

fse decode : 0.003 seconds, 166.95 b/kc, rate= 288.67 mb/s

huf : 768,771 -> 438,437 = 4.562 bpb = 1.753 to 1

huf encode : 0.003 seconds, 147.09 b/kc, rate= 254.34 mb/s

huf decode : 0.003 seconds, 163.54 b/kc, rate= 282.82 mb/s

huf decode : 0.003 seconds, 175.21 b/kc, rate= 302.96 mb/s (*1)

inName : pic

H = 1.210

arith 12: 513,216 -> 79,473 = 1.239 bpb = 6.458 to 1

arith encode : 0.003 seconds, 91.91 b/kc, rate= 158.90 mb/s

arith decode : 0.007 seconds, 45.07 b/kc, rate= 77.93 mb/s

rans 12: 513,216 -> 79,474 = 1.239 bpb = 6.458 to 1

rans encode : 0.007 seconds, 45.52 b/kc, rate= 78.72 mb/s

rans decode : 0.003 seconds, 96.50 b/kc, rate= 166.85 mb/s

fse : 513,216 -> 80,112 = 1.249 bpb = 6.406 to 1

fse encode : 0.003 seconds, 93.86 b/kc, rate= 162.29 mb/s

fse decode : 0.002 seconds, 164.42 b/kc, rate= 284.33 mb/s

huf : 513,216 -> 106,691 = 1.663 bpb = 4.810 to 1

huf encode : 0.002 seconds, 162.57 b/kc, rate= 281.10 mb/s

huf decode : 0.002 seconds, 189.66 b/kc, rate= 328.02 mb/s