Results 1 to 16 of 16

Thread: Dynamic, canonical, and static Huffman

  1. #1
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    353
    Thanks
    131
    Thanked 54 Times in 38 Posts

    Dynamic, canonical, and static Huffman

    Hi all – I have some questions about the different forms of Huffman coding, and where they're used, and I figured many of you would be able to fill in the blanks. Thanks for your help.

    Static Huffman: Does this mean 1) a tree generated from a single pass over all the data, or 2) some sort of predefined table independent of any given data, like defined in a spec? I'm seeing different accounts from different sources. For example, the HPACK header compression spec for HTTP/2 has a predefined static Huffman table, with codes specified for each ASCII character (starting with five-bit codes). Conversely, I thought static Huffman in deflate / gzip was based on a single pass over the data. If deflate or gzip have predefined static Huffman tables (for English text?), I've never seen them.

    Dynamic/Adaptive Huffman: What's the definition? How dynamic are we talking about? It's used in deflate and gzip right? Is it dynamic per block? (Strangely, the Wikipedia article says that it's rarely used, but I can't think of a codec more popular than deflate...)

    Canonical Huffman: Where is this used?

    By the way, do you think it would be feasible to implement adaptive arithmetic coding with less CPU and memory overhead than deflate? The Wikipedia article also said this about adaptive Huffman: "...the cost of updating the tree makes it slower than optimized adaptive arithmetic coding, which is more flexible and has better compression." Do you agree that adaptive arithmetic coding should be faster and with better ratios? What about the anticipated CPU and memory overhead?

    Thanks.

  2. #2
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    803
    Thanks
    244
    Thanked 255 Times in 159 Posts
    Probably nearly all used Huffman is static and canonical (?) - what reduces size of headers.
    Adaptive Huffman is interesting for theoretical considerations, but probably more costly than adaptive AC - doesn't seem to make sense in practice (?)

    But adaptive AC/ANS is quite popular for exploiting non-stationarity, e.g. LZNA, RAZOR, lolz use adaptive rANS, reaching ~100MB/s/core for 4bit alphabet (additional memory cost is negligible).
    https://sites.google.com/site/powturbo/entropy-coder

  3. Thanks:

    Bulat Ziganshin (3rd July 2020)

  4. #3
    Member
    Join Date
    Sep 2018
    Location
    Philippines
    Posts
    121
    Thanks
    31
    Thanked 2 Times in 2 Posts
    Quote Originally Posted by SolidComp View Post
    Hi all – I have some questions about the different forms of Huffman coding, and where they're used, and I figured many of you would be able to fill in the blanks. Thanks for your help.

    Static Huffman: Does this mean 1) a tree generated from a single pass over all the data, or 2) some sort of predefined table independent of any given data, like defined in a spec? I'm seeing different accounts from different sources. For example, the HPACK header compression spec for HTTP/2 has a predefined static Huffman table, with codes specified for each ASCII character (starting with five-bit codes). Conversely, I thought static Huffman in deflate / gzip was based on a single pass over the data. If deflate or gzip have predefined static Huffman tables (for English text?), I've never seen them.

    Dynamic/Adaptive Huffman: What's the definition? How dynamic are we talking about? It's used in deflate and gzip right? Is it dynamic per block? (Strangely, the Wikipedia article says that it's rarely used, but I can't think of a codec more popular than deflate...)

    Canonical Huffman: Where is this used?

    By the way, do you think it would be feasible to implement adaptive arithmetic coding with less CPU and memory overhead than deflate? The Wikipedia article also said this about adaptive Huffman: "...the cost of updating the tree makes it slower than optimized adaptive arithmetic coding, which is more flexible and has better compression." Do you agree that adaptive arithmetic coding should be faster and with better ratios? What about the anticipated CPU and memory overhead?

    Thanks.
    When i was writing The Data Compression Guide, i included Adaptive Huffman Coding of Knuth by reading the paper of Vitter. My implementations for Knuth's and Vitter's algorithms are correct, exactly adhering to Vitter's description. Accordingly, Knuth's algorithm is used in the early program "compact". You can optimize my sample code or re-implement these algorithms.

    https://sites.google.com/site/datacompressionguide/fgk

    https://sites.google.com/site/dataco...onguide/vitter

    Yes, adaptive huffman coding can be done per block to avoid constant updates to the dynamic Huffman tree. There are compressors that do this as listed in comp.compression FAQ.

    Canonical Huffman coding and length-limited codes are studied extensively by A. Moffat.
    Last edited by compgt; 2nd July 2020 at 20:25.

  5. #4
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    876
    Thanks
    242
    Thanked 325 Times in 198 Posts
    Quote Originally Posted by SolidComp View Post
    Even when they are called Huffman codes, these are not Huffman codes. They are static prefix codes. Static prefix coding that is not adapted to a particular use was used in 1840s when Morse, Gerke and ITU codes were introduces.

    Huffman coding is an improvement over static prefix coding. Huffman coding follows a process introduced in 1952 http://compression.ru/download/artic...ancy-codes.pdf

    A Huffman code is an optimal prefix code -- when not including the coding length of the code itself.

    Quote Originally Posted by SolidComp View Post
    Dynamic/Adaptive Huffman: What's the definition? How dynamic are we talking about? It's used in deflate and gzip right? Is it dynamic per block? (Strangely, the Wikipedia article says that it's rarely used, but I can't think of a codec more popular than deflate...)
    Dynamic Huffman in deflate vocabulary means a normal Huffman code. Deflate does not have actual dynamic (adaptive) Huffman codes despite that the normal Huffman codes are called dynamic.

    A real adaptive (or dynamic) Huffman code builds a more accurate version as data is being transmitted. Overall this is (roughly) equal density to sending the codes separately, but can receive the first symbols of the stream with less latency. The cost for decompression is quite big so this method is usually avoided. There are some more sane variations of adaptive prefix coding that are slightly cheaper, but don't follow the Huffman process for rebuilding the code. These, too, tend to be too complex and slow for practical use.

    Quote Originally Posted by SolidComp View Post
    Canonical Huffman: Where is this used?
    In a canonical prefix code -- including the canonical Huffman code -- the codes are lexically ordered by length. You only communicate the lengths and the codes are implicit.

    Quote Originally Posted by SolidComp View Post
    By the way, do you think it would be feasible to implement adaptive arithmetic coding with less CPU and memory overhead than deflate? The Wikipedia article also said this about adaptive Huffman: "...the cost of updating the tree makes it slower than optimized adaptive arithmetic coding, which is more flexible and has better compression." Do you agree that adaptive arithmetic coding should be faster and with better ratios? What about the anticipated CPU and memory overhead?
    Going from prefix coding to arithmetic is not a big improvement. One needs to have context, too, usually for adaptive coding.

    Adaption is based on context, often the probabilities of a few symbols are changed based on context. Traditionally prefix coding is not compatible with the concept of context.

    I believe that the first practical implementations of contextful prefix coding were in WebP lossless and Brotli. In JPEG XL we use that context-based method for ANS, too.

  6. Thanks (2):

    anormal (17th September 2020),JamesWasil (2nd July 2020)

  7. #5
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    803
    Thanks
    244
    Thanked 255 Times in 159 Posts
    I don't know when first prefix codes were used - definitely before Huffman, but Morse code is not prefix-free, instead it uses three types of gaps for separation (this redundancy is very helpful for synchronization, error correction).

    Regarding context-dependece and adaptivity, they are separate concepts:
    - fixed Markov process is example of the former, ARMA model uses context as a few previous values, lossless image compression uses context as already decoded neighboring pixels, etc.
    - adaptation usually refers to evolving models/distribution for non-stationarity, e.g. independent variables of evolving probability distribution in adaptive prefix/AC/ANS, can be e.g. Gaussian/Laplace distribution of evolving parameters (e.g. https://arxiv.org/pdf/2003.02149 ).
    We can combine both, e.g. in Markov process of online optimized parameters, adaptive ARMA, LSTM etc.

  8. #6
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,977
    Thanks
    296
    Thanked 1,304 Times in 740 Posts
    I don't see a point in being too strict about definitions here.
    For example, zlib implementation limits codelengths to 15 bits or less,
    so strictly speaking that's not really Huffman code either.

    So,
    1) deflate only uses static Huffman coding, with predefined table or block header.
    Predefined code is still a valid huffman code for some other data.
    2) Adaptive code is based on idea of data stationarity - that statistics for
    known symbols s[0]..s[i-1] would also apply to s[i].
    In most cases adaptive coders would adjust their codes after processing each symbol,
    but its also possible to do batch updates.

    > do you think it would be feasible to implement adaptive arithmetic coding
    > with less CPU and memory overhead than deflate?

    This depends on implementation.
    Compared to zlib - most likely yes, since its not very speed-optimized.

    Also there's nothing good about zlib memory usage, so its not a problem.

    But a speed optimized deflate implementation has a decoding speed of >1GB/s,
    which is not feasible for truly adaptive AC/ANS.
    Might be technically possible with batch updates using _large_ batches (better to just encode the stats),
    or only apply AC for rarely used special symbols (so most of the code would remain huffman).
    However LZ is not always applicable (might not find matches in some data), so
    always beating deflate would also mean adaptive AC with >1GB/s, which is not realistic on current hardware.

    And encoding speed is hard to compare, since there're many different algorithms
    used to encode to same deflate format.
    I'd say that it should be possible to make LZ+AC/ANS coder which would be
    both faster and stronger than all zlib levels (zstd likely already applies).

  9. #7
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    353
    Thanks
    131
    Thanked 54 Times in 38 Posts
    Thanks everyone. A couple of points:

    What's this idea that deflate only uses static Huffman? One of the reasons SLZ is so much faster than zlib is supposed to be that Willie only uses static Huffman. This was supposed to be a difference from standard zlib operation (for gzipping). Willie talks about this in his write-up. Is he mistaken about zlib? http://www.libslz.org/

    Eugene, deflate doesn't have a predefined Huffman or prefix table, does it? Predefined table is different from static. When I say predefined, I mean a table that says here's the code for the letter A, here's the code for the letter B, etc. Like the HPACK table, which covers all of ASCII. I've never seen a predefined table in a deflate spec. Did I miss it?

    What are the assumptions of Huffman optimality? Was it something about the distribution of the data? Does that mean that for some data distributions, arithmetic coding has no advantage over Huffman?

  10. #8
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    353
    Thanks
    131
    Thanked 54 Times in 38 Posts
    Oh, and does deflate use canonical Huffman coding?

  11. #9
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    353
    Thanks
    131
    Thanked 54 Times in 38 Posts
    So when you say there's nothing good about zlib memory usage, you mean it uses too much?

    It uses about 100 MB RSS, while SLZ uses only 10 MB. SLZ is my baseline for most purposes. It would be interesting if a LZ+AC/ANS solution could be made that used 10 or fewer MB, and very little CPU, and was fast.

    I think maybe brotli and Zstd in their low modes, maybe -1 or so, could approach those metrics, but I'm not sure.

    If we preprocess and sort data into blocks that group say all the numeric data in one block, all the text in another, hard random data in another, and then generated Huffman trees for each block separately, that's still static Huffman, right? It should help because you could get shorter codes on average if you just have numbers, or letters, and so forth, depending on the overhead of the tables.

  12. #10
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,977
    Thanks
    296
    Thanked 1,304 Times in 740 Posts
    > Is he mistaken about zlib? http://www.libslz.org/

    No, you are.
    What he says is "Instead, SLZ uses static huffman trees" and
    "the dynamic huffman trees would definitely shorten the output stream".
    I don't see anything about dynamic _coding_ there.

    > Eugene, deflate doesn't have a predefined Huffman or prefix table, does it?

    https://en.wikipedia.org/wiki/DEFLATE#Stream_format
    "01: A static Huffman compressed block, using a pre-agreed Huffman tree defined in the RFC"

    > What are the assumptions of Huffman optimality?

    Huffman algorithm is the algorithm for generation of a binary prefix code
    which compresses a given string to a shortest bitstring.

    > Does that mean that for some data distributions, arithmetic coding has no
    > advantage over Huffman?

    Static AC has the same codelength in a case where probabilities of all
    symbols are equal to 1/2^l[c].
    Adaptive AC is basically always stronger for real data, but its possible
    to generate artifical samples where Huffman code would win.

    > Oh, and does deflate use canonical Huffman coding?

    Yes, if you accept limited-length prefix code as Huffman code.

    > So when you say there's nothing good about zlib memory usage, you mean it uses too much?

    It uses a hashtable and a hash-chain array.
    So there're all kinds of ways to use less memory - from just using a hashtable alone
    to BWT SA to direct string search.

    > It uses about 100 MB RSS, while SLZ uses only 10 MB. SLZ is my baseline for
    > most purposes. It would be interesting if a LZ+AC/ANS solution could be
    > made that used 10 or fewer MB, and very little CPU, and was fast.

    You have very strange methods of measurement.
    If some executable uses 100MB of RAM, it doesn't mean that algorithm there
    requires that much memory - it could be just as well some i/o buffer for files.

    Also what do you mean by "uses little CPU"?
    There's throughput, latency, number of threads, SIMD width, cache dependencies, IPC, core load...
    its hard to understand what you want to minimize.

    For example, SLZ has low latency and memory usage, but its not necessarily fastest
    deflate implementation for a single long stream.

    > I think maybe brotli and Zstd in their low modes,
    > maybe -1 or so, could approach those metrics, but I'm not sure.

    For zstd you'd probability have to tweak more detailed parameters, like wlog/hlog.
    I don't think it has defaults to save memory to _that_ extent.

    > then generated Huffman trees for each block separately, that's still
    > static Huffman, right?

    Yes. If the code says the same within a block, then its static.

  13. #11
    Member JamesWasil's Avatar
    Join Date
    Dec 2017
    Location
    Arizona
    Posts
    103
    Thanks
    96
    Thanked 18 Times in 17 Posts
    Quote Originally Posted by SolidComp View Post
    Thanks everyone. A couple of points:
    What are the assumptions of Huffman optimality? Was it something about the distribution of the data? Does that mean that for some data distributions, arithmetic coding has no advantage over Huffman?
    The advantage that arithmetic encoding has over Huffman and other prefix implementations is that it is able to achieve fractional bits to assign a probability to a codeword rather than a whole entire bit.

    You're going to get more out of arithmetic compression whenever the symbol probabilities are not a power of 2 because of the amount gained from the delta values between it.

    For example, if you were to grab the probabilities for symbols from a file and see that you only needed 1.67 bits per symbol, with arithmetic encoding you'll be able to get it represented as 1.67 bits or very close to it on a fractional margin. Whereas with Huffman and other methods, it still requires at least 2 bits to represent the same probability.

    Whenever these instances occur, you'll be able to save the difference (in this example, .33 of a bit) for each and every occurrence of that probability.

    These savings add up, and the larger your data source is, the more significant these savings of fractional bits will be over the use of Huffman and other methods.

    So basically, Huffman will only be optimal if what you're reading is a power of 2. If it's not, then arithmetic encoding will give you more.

    If you're combining contexts, then you'll get more out of arithmetic compression usually because the additional contexts can be represented with fractional bits and save on that difference, too.

  14. #12
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    353
    Thanks
    131
    Thanked 54 Times in 38 Posts
    Quote Originally Posted by Shelwien View Post
    > Is he mistaken about zlib? http://www.libslz.org/
    No, you are.
    What he says is "Instead, SLZ uses static huffman trees" and
    "the dynamic huffman trees would definitely shorten the output stream".
    I don't see anything about dynamic _coding_ there.
    I don't follow. There's a difference between Huffman trees and Huffman coding in this context? What's the difference?

    > It uses about 100 MB RSS, while SLZ uses only 10 MB. SLZ is my baseline for
    > most purposes. It would be interesting if a LZ+AC/ANS solution could be
    > made that used 10 or fewer MB, and very little CPU, and was fast.

    You have very strange methods of measurement.
    If some executable uses 100MB of RAM, it doesn't mean that algorithm there
    requires that much memory - it could be just as well some i/o buffer for files.


    Those are just the stats from Willie's benchmarks. How else do you measure memory use but by measuring memory use during program execution?

    Also what do you mean by "uses little CPU"?
    There's throughput, latency, number of threads, SIMD width, cache dependencies, IPC, core load...
    its hard to understand what you want to minimize.

    For example, SLZ has low latency and memory usage, but its not necessarily fastest
    deflate implementation for a single long stream.
    I like software that is engineered for performance and low overhead, like SLZ. I want it to be 1% CPU or so. SIMD can be one way to get there, so long as it doesn't throttle down core frequencies like using AVX-512 does.

    What is faster than SLZ? It's much faster than the ridiculous Cloudflare zlib patch that no one can actually build anyway (typical Cloudflare). It's faster than libdeflate. What else is there?

  15. #13
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,572
    Thanks
    780
    Thanked 687 Times in 372 Posts
    first, we need to extend the vocabulary:

    static code: single encoding for the entire file, encoding tables are stored in the compressed file header
    block-static one: file is split into blocks, each block has its own encoding tables stored with the block

    dynamic: encoding tables are computed on-the-fly from previous data, updated each byte
    block-dynamic: the same, but encoding tables are updated f.e. once each 1000 bytes

    so,
    - the first LZ+entropy coder was lzari with dyn. AC
    - second one was lzhuf aka lharc 1.x with dyn. huffman
    - then, pkzip 1.x got Implode with static huffman coder. It employed canonical huffman coding so compressed file header stored only 4 bits of prefix code length for each of 256 chars (and more for LZ codewords).
    - then, ar002 aka lha 2.x further improved this idea and employed block-static huffman. It also added secondary encoding tables used to encode code lengths in the block header (instead of fixed 4-bit field)
    - pkzip 2.x added deflate which used just the same scheme as ar002 (in this aspect, there were a lot of other changes)

    Since then, block-static huffman became de-facto standard for fast LZ77-based codecs. It's used in RAR2 (which is based on deflate), cabarc (lzx), brotli (that added some O1 modelling), zstd (that combines block-static huffman with block-static ANS).

    Static/block-static codes require two passes over data - first you compute frequencies, then build tables and enocde data. You can avoid first pass by using fixed tables (or use more complex tricks such as building tables on the first 1% of data). Deflate block header specifies whether it uses custom encoding tables encoded in the block header (DYNAMIC) or fixed ones defined in the code (STATIC). So, this field in the block has its own vocabulary.

    Tornado implements both block-dynamic huffman and block-dynamic AC. Dynamic/block-dynamic codes uses only one pass over data, and block-dynamic coding is as fast as second pass of *static one.

  16. #14
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,977
    Thanks
    296
    Thanked 1,304 Times in 740 Posts
    > I don't follow. There's a difference between Huffman trees
    > and Huffman coding in this context? What's the difference?

    Dynamic and static coding are different algorithms.
    Dynamic is more complex and slower, but doesn't require multiple passes.

    "How the huffman tree is generated" and "how the tree is encoded" and "how the data is encoded using the tree"
    are completely unrelated questions, only the latter one is about whether coding is static or dynamic.

    > Those are just the stats from Willie's benchmarks.

    Well, the only table there with 100M zlib is for
    "HAProxy running on a single core of a core i5-3320M, with 500 concurrent users, the default 16kB buffers"

    so 100M is used by 500 instances of zlib, thus 200kb per instance, which is reasonable.

    As to CPU usage, its actually limited by output bandwidth there.
    "gigabit Ethernet link (hence the 976 Mbps of IP traffic)"

    So I guess your "CPU" is supposed to be measured as
    (976000000/8)*100/(encoding_speed_in_bytes/compression_ratio),
    so 256MB/s corresponds to 100% and 512MB/s to 50% (at SLZ's CR 2.1).

    This is not something obvious and is only defined for a specific use case,
    so don't expect anybody understanding you without explicit term definition.

    > How else do you measure memory use but by measuring memory use during program execution?

    Sure, its okay in libslz case, because he compares stats of the same program (haproxy),
    just with different compression modules.
    Comparing different programs like that is much less precise because stats don't show
    which algorithms use this memory.

    Well, anyway, in this case the best approach would be to use internal measurement
    (by replacing the standard memory manager)

    For example, this:
    for( i=0; i<1000000; i++ ) new char[1]; // PeakWorkingSetSize: 34,775,040
    and this:
    for( i=0; i<1; i++ ) new char[1000000]; // PeakWorkingSetSize: 2,777,088

    both allocate exactly 1000000 bytes of useful memory.
    But the memory usage by corresponding executables is 17x higher in first case.
    Why? Because of overhead of specific default memory manager.
    Which can be replaced with something custom-made and then the examples would
    be suddenly equal in memory usage.

    > I like software that is engineered for performance and low overhead, like SLZ.
    > I want it to be 1% CPU or so.

    So in SLZ terms its 25GB/s. SLZ itself only reaches 753MB/s (in some special cases),
    so good like with that. Maybe sleeping for 10-20 years would help :)

    > It's faster than libdeflate. What else is there?

    Well, zstd https://github.com/facebook/zstd#benchmarks
    or LZ4: https://github.com/lz4/lz4#benchmarks
    https://sites.google.com/site/powtur...sion-benchmark
    https://sites.google.com/site/powturbo/entropy-coder

    Thing is, normally decoding is a more frequent operation, so codec developers mostly work on decoding speed optimization.

  17. #15
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    353
    Thanks
    131
    Thanked 54 Times in 38 Posts
    You said SLZ wasn't the fastest deflate implementation, now you're talking about Zstd and LZ4. I was asking what other deflate implementations are faster.

    On memory managers, you mean OS, or something application specific?

  18. #16
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,977
    Thanks
    296
    Thanked 1,304 Times in 740 Posts
    > You said SLZ wasn't the fastest deflate implementation, now you're talking about Zstd and LZ4.

    Well, they can be modified to write to deflate format.
    The slow part in LZ encoding is matchfinding, not huffman coding.

    > I was asking what other deflate implementations are faster.

    Based on this, intel gzip is faster: https://sites.google.com/site/powtur...eb-compression
    Also there's hardware for deflate encoding: https://en.wikipedia.org/wiki/DEFLATE#Hardware_encoders

    > On memory managers, you mean OS, or something application specific?

    Layers are added by OS, standard library, sometimes app itself too.

    SLZ is designed for a special use case where it has to compress 100s of potentially infinite streams
    in parallel on cheap hardware.
    However its not a good solution for other compression tasks, not even storage or filesystems.

Similar Threads

  1. LZ style compression with static Dictionary
    By Thcm in forum Data Compression
    Replies: 9
    Last Post: 9th August 2018, 23:07
  2. Passowrd-specific static PPM from MS
    By m^2 in forum The Off-Topic Lounge
    Replies: 0
    Last Post: 9th December 2013, 21:36
  3. Replies: 33
    Last Post: 27th August 2011, 05:13
  4. Dummy Static [Windowless] Dictionary Text Decompressor
    By Sanmayce in forum Data Compression
    Replies: 4
    Last Post: 12th October 2010, 18:55

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •