Yeah, right. Its 2010 and we have C.Bloom writing fast huffman decoders, instead of posting HP entries or something.
I guess the next step would be RLE?.. It really disappoints the hell out of me.
Btw, I know at least two approaches which allow for better
static compression than huffman with potentially similar
decoding speed (as they're still about bitcodes anyway).
1. nonprefix coding - http://encode.su/threads/210-Advance...ull=1#post3582
We can drop huffman's prefix property, and modify the decoder a bit,
by adding some kind of backtracking. Then some codewords are marked as "forbidden",
and activate the backtracking when seen.
Code:
book1 768771
aridemo0 435603 // aridemo "v0" model (adaptive rangecoder), http://compression.ru/sh/aridemo6.rar
huff0000 438374 // o0 huffman
huff0001 435971 // o0 nonprefix
In this implementation the encoder is far from efficient, but instead its
very easy to understand and is still able to demonstrate the effect.
It starts with huffman code and encodes the data.
Then it marks all the bitstrings of length up to M (=some max codelength)
which can be seen in the previous pass output.
Then, if there appears an unused bitstring shorter than some of current symbol
codes, the code gets replaced, and file reencoded, and we go for next iteration.
But even with this dumb approach, there's still compression improvement,
and decoder is still able to decode it more or less sequentially.
2. fritcoding - http://nishi.dreamhosters.com/u/fritcode_v0.rar
The idea is to produce a bitcode with skewed bit probabilities.
For example, if we know that "0" has a probability 0.75, it means
that a high-probable symbol can be encoded better.
For example, here's a fritcode(p0=1/3) for "abracadabra"
Code:
a - <0>
b - <10>
c - <110>
d - <1110>
r - <1111>
So "abracadabra"="010111101100111001011110", with 9 0s and 15 1s
Obviously, this is something like preprocessing, and an actual
entropy coding is required next. For example, we can use a
static huffman for n-bit strings, or even a rangecoder
(which can work in another thread, thus not delaying anything).
Unfortunately, I don't have an unordered fritcoding implementation
which can be directly compared with huffman coding (though its
obviously possible), because the whole idea was invented in attempt
to improve a compressed BWT implementation.
Code:
768771 - book1
461088 - optimal ordered bitcode, http://www.ctxmodel.net/files/bitcode_v1.rar (test4)
453160 - huffman code for book1fr (fritcode split into 8-bit codewords)
This may be not much, but these approaches can be combined, and at least
its something new, unlike huffman coding.