Results 1 to 3 of 3

Thread: Charles Blooms new huffman optimisation

  1. #1
    Join Date
    Feb 2010
    Thanked 36 Times in 12 Posts

    Charles Blooms new huffman optimisation

  2. #2
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Kharkov, Ukraine
    Thanked 1,018 Times in 540 Posts
    Yeah, right. Its 2010 and we have C.Bloom writing fast huffman decoders, instead of posting HP entries or something.
    I guess the next step would be RLE?.. It really disappoints the hell out of me.

    Btw, I know at least two approaches which allow for better
    static compression than huffman with potentially similar
    decoding speed (as they're still about bitcodes anyway).

    1. nonprefix coding -

    We can drop huffman's prefix property, and modify the decoder a bit,
    by adding some kind of backtracking. Then some codewords are marked as "forbidden",
    and activate the backtracking when seen.

    book1     768771
    aridemo0  435603 // aridemo "v0" model (adaptive rangecoder),
    huff0000  438374 // o0 huffman
    huff0001  435971 // o0 nonprefix
    In this implementation the encoder is far from efficient, but instead its
    very easy to understand and is still able to demonstrate the effect.
    It starts with huffman code and encodes the data.
    Then it marks all the bitstrings of length up to M (=some max codelength)
    which can be seen in the previous pass output.
    Then, if there appears an unused bitstring shorter than some of current symbol
    codes, the code gets replaced, and file reencoded, and we go for next iteration.

    But even with this dumb approach, there's still compression improvement,
    and decoder is still able to decode it more or less sequentially.

    2. fritcoding -

    The idea is to produce a bitcode with skewed bit probabilities.
    For example, if we know that "0" has a probability 0.75, it means
    that a high-probable symbol can be encoded better.

    For example, here's a fritcode(p0=1/3) for "abracadabra"
    a - <0>
    b - <10>
    c - <110>
    d - <1110>
    r - <1111>
    So "abracadabra"="010111101100111001011110", with 9 0s and 15 1s

    Obviously, this is something like preprocessing, and an actual
    entropy coding is required next. For example, we can use a
    static huffman for n-bit strings, or even a rangecoder
    (which can work in another thread, thus not delaying anything).

    Unfortunately, I don't have an unordered fritcoding implementation
    which can be directly compared with huffman coding (though its
    obviously possible), because the whole idea was invented in attempt
    to improve a compressed BWT implementation.

    768771 - book1
    461088 - optimal ordered bitcode, (test4) 
    453160 - huffman code for book1fr (fritcode split into 8-bit codewords)
    This may be not much, but these approaches can be combined, and at least
    its something new, unlike huffman coding.

  3. Thanks:

    RamiroCruzo (16th April 2018)

  4. #3
    Join Date
    Feb 2010
    Thanked 36 Times in 12 Posts
    Well yeah, if you read the rest of his site/blog you get a lot more depth

    there's even some discussion right near the top that's well interesting

    just keep reading, and finding more

Similar Threads

  1. Code Optimisation
    By Cyan in forum Data Compression
    Replies: 18
    Last Post: 18th January 2010, 01:48
  2. huffman's Coding
    By swapy in forum Data Compression
    Replies: 5
    Last Post: 12th August 2009, 23:51
  3. Advanced Huffman Encoding
    By Simon Berger in forum Data Compression
    Replies: 28
    Last Post: 15th April 2009, 15:24
  4. Replies: 8
    Last Post: 12th December 2008, 14:11

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts