i've seen here discussions about lzhuf/lzari encoding. using my tornado experience, i may clear up some things...
look at these results:
Code:
>tor -o delphi
ari: 369773 -> 96636 kb (26.1%): 53.393 sec
>tor -o -c3 delphi
huf: 369773 -> 98126 kb (26.5%): 51.552 sec
>tor6++ -o -c3 delphi
huf: 369773 -> 96940 kb (26.2%): 59.697 sec
cnt+=1, huf: 2700/6
>tor -o oo
ari: 368738 -> 84514 kb (22.9%): 43.322 sec
>tor -o -c3 oo
huf: 368738 -> 85501 kb (23.2%): 41.890 sec
>tor6++ -o -c3 oo
huf: 368738 -> 84756 kb (23.0%): 48.509 sec
here, delphi represents binary and OO - textual data. first test for each file uses ARI encoding, second - current HUFFMAN coder, third - the same HUFFMAN coder with settings close to the settings of ARI coder. as you'll find, while my default HUFFMAN implementation is 1-1.5% worse than ARI, with the same settings difference is only 0.3%
at the same time, HUFFMAN at these settings becomes very slow, spending a lot of time in tree rebuilding. my implementation is semi-dynamic and both ari and "improved huffman" rebuilds trees after every 10-20k elements decoded. don't forget that semi-dynamic approach is symmetric and requires to do the same tree rebuilding at decode stage too, where overall speed decrease may be 2x-3x (decomression time for both files is somewhat about 5-10s)
this should answer your second question - is STATIC HUFFMAN compress better than DYNAMIC one? no, i think, it's just much faster at decoding stage and it is why it was used since pkzip 1.x
but there are ways to spped up tree rebuilding, making this (very complex) static approach unnecessary. these methods was described by KADACH in his thesis. shortly speaking, we should
1) use merging of sorted lists instead of heap manipulation when building huffman tree
2) presort char freqs using sort<> instaed of qsort() - the later spends too much time calling comparision routine
3) even better, use radix sort (afair, 2 "digits" is optimal) to make sort even faster
this allows to build tree in O(n) time instead of O(n*log(n)). tornado already implements (1), otherwise even its current settings (rebuilding tree each 50k elements) will be probably too slow compared to ARI. implementation of all 3 improvements will allow to develop semi-dynamic LZHUF coder that's only 0.3% worse than LZARI implementations but decompress about 1.5x faster
this remains true for all simple, ar002-derived coders (lzh, squueze, deflate, rar, ace, cabarc, tornado). only lzma-like beasts got real improvements by using arithmetic coders
ps: see a bit more info in my post http://www.encode.su/forums/index.ph...ic=408#msg4265