Page 1 of 7 123 ... LastLast
Results 1 to 30 of 188

Thread: Benchmarking Entropy Coders

  1. #1
    Member
    Join Date
    Mar 2013
    Location
    Worldwide
    Posts
    486
    Thanks
    52
    Thanked 182 Times in 133 Posts

    Entropy Coder Benchmark

    Last edited by dnd; 31st October 2015 at 00:21.

  2. Thanks:

    ne0n (7th March 2014)

  3. #2
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    702
    Thanks
    217
    Thanked 217 Times in 134 Posts
    Thanks, there was really missing some comparison.
    It's strange that Huffman gets the same rates here, as ANS additionally uses fractional bits.

    You can also add tANS for binary alphabet (tABS) to compare with binary AC for varying probabilities (like Matt's fpaqb but with optimizations) - for this TurboANS it would be ~427/8~53.4 MB/s (a bit more as you generate coding tables only once).
    Last edited by Jarek; 5th March 2014 at 16:20.

  4. #3
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    865
    Thanks
    463
    Thanked 260 Times in 107 Posts
    Is the benchmark program open sourced ?

    [Edit] It is apparently not open sourced nor verifiable.
    As a consequence, I'm not keen to see FSE described using such bogus numbers, which are far off the mark.

    As published on https://github.com/Cyan4973/FiniteStateEntropy,
    the speed of single-threaded FSE on a Core i5 @ 2.96GHz is :
    entropy only : compression 430 MB/s , decompression 520 MS/s
    full code (including header & table construction) : compression 300 MB/s, decompression 420 MB/s
    and since the code & data fits into L1 cache, it ramps up pretty weel with clock speed.

    Therefore, an I7-2600k at 4.5GHz should experience better speed than that, not worse.

    People willing to have a better understanding of its performance should better test it independently.
    After all, the code is freely available, open sourced, and there is a benchmark module included.
    Last edited by Cyan; 25th July 2015 at 15:01.

  5. #4
    Member
    Join Date
    Mar 2013
    Location
    Worldwide
    Posts
    486
    Thanks
    52
    Thanked 182 Times in 133 Posts
    Quote Originally Posted by Jarek View Post
    It's strange that Huffman gets the same rates here, as ANS additionally uses fractional bits.
    This is valid for this symbol distribution. Additionally the headers for huffman are in general smaller.
    Quote Originally Posted by Jarek View Post
    You can also add tANS for binary alphabet (tABS) to compare with binary AC for varying probabilities (like Matt's fpaqb but with optimizations) - for this TurboANS it would be ~427/8~53.4 MB/s (a bit more as you generate coding tables only once).
    These are in memory benchmark, fpab is file based, it must be modified. You can't deduce the speed simply by dividing by 8.

    @Cyan: the benchmark is actually not open source. You can modify fsbench for this purpose.

  6. #5
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    702
    Thanks
    217
    Thanked 217 Times in 134 Posts
    Quote Originally Posted by dnd View Post
    Additionally the headers for huffman are in general smaller.
    Huffman's probability quantization is really bad- see fig. 3 in http://arxiv.org/pdf/1008.3597v1.pdf ... and accurate has still room for improvements (thread).
    It means that for accurate entropy coder you can get the same rate using a few times smaller header than Huffman, or get better compression rate for the same header size.

    ... and you should have exactly the same header issue for arithmetic coding you show...

    These are in memory benchmark, fpab is file based, it must be modified. You can't deduce the speed simply by dividing by 8.
    It is using tANS for binary alphabet ... the difference is that you switch between multiple tables, read single bits not bytes ... but you generate coding tables only once.

  7. #6
    Member
    Join Date
    Mar 2013
    Location
    Worldwide
    Posts
    486
    Thanks
    52
    Thanked 182 Times in 133 Posts
    Quote Originally Posted by Jarek View Post
    ... and you should have exactly the same header issue for arithmetic coding you show...
    All arithmetic coders (except FastAC=quasi-adaptive) are full-adaptive (1x or 8x per byte). No header required.

  8. #7
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    702
    Thanks
    217
    Thanked 217 Times in 134 Posts
    In such case, you should compare with block-adaptive ANS, maybe provide header for the first block only (~141B in FSE) ... but it makes such comparison more blurry.

    If you want to make a really clear and fair comparison of entropy coders only, you should provide them with the same probability distribution and measure pure data processing speed and rate, comparing with the optimum rate: Shannon entropy.
    Preparation should be treated separately - you can provide time for Huffman or ANS table preparation as a separate parameter in benchmark.

  9. #8
    Member
    Join Date
    Mar 2013
    Location
    Worldwide
    Posts
    486
    Thanks
    52
    Thanked 182 Times in 133 Posts
    The goal of this benchmark is to compare different and the latest implementations primarily for speed comparisons under common distributions.
    I don't want to speculate about what is possible, before implementing.
    For ex. this benchmark demonstrate for the first time that binary arithmetic coding (8x per byte) can be faster than bytewise arithmetic coding (1x per byte).

  10. #9
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Please include PPMd var.J (-o2 ... -o4)

  11. #10
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts
    You also have to realize that at least for the adaptive arithmetic there will be a difference in compression for enwik9 based on the use of Kt or Laplace update when doing adaptive.

    Also if when comparing entropy encoders they may not really be real entropy encoders. Try compressing various permutations
    of enwik9 if the arithmetic are basic entropy coders they should compress each permutation to the same length give or take a byte.
    I tested several so called basic entropy coders years ago and found most designed to look good for files like enwik9 but they failed to compress to same amount when permutations of enwik9 used in fact they got far worse compressions meaning they are not true basic
    entropy compressors

    arb255 is a true bijective arithmetic compressor on matts benchmark it gets 644,561,595 I don't remember if the KT or Laplace update rate used for the cells that would make a difference in the compressed size.

    UPDATE I just ran current arb255 it get 644,561,590 it is smaller used KT for the update
    but all permutations of enwik9 will compress to this same length give or take ONE byte.

    However unless you test permutations of the input files you can't be sure the code actually is a real entropy coder.

    an easy way to generate some permutations of the input test file is to do a few BWTS or UNBWTS on the test file then run your tests.
    Last edited by biject.bwts; 5th March 2014 at 19:58. Reason: value for current arb255

  12. #11
    Member
    Join Date
    Mar 2013
    Location
    Worldwide
    Posts
    486
    Thanks
    52
    Thanked 182 Times in 133 Posts
    @encode: sorry, but the benchmark is about pure entropy coders.

  13. #12
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts
    The reason that ARB255 looks worse in size is that its most likely the only pure entropy coder mentioned so far. The other coders will do worse size wise if you test with permutations. The reason is that by the use of limited high and low bit lengths in the state registers they loose information about the true state so there are really adjusting the compression by using local effects that are different when the file undergoes certain permutations. I realize that for many applications one really does not want a true entropy coder but when one does you have to realize they are not true entropy coders. Another thing arb255 though a true coder is really just a test bed its very very slow.

  14. #13
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    702
    Thanks
    217
    Thanked 217 Times in 134 Posts
    dnd, comparing ANS with header for each block, with adaptive AC without headers is not fair (I think you use FastAC also with headers).
    If you want to "compare pure entropy coders" do it by giving them the same distribution - and you can compare with the maximal available rate: Shannon entropy.
    Now you are comparing simple compressors based on them.
    Also binary coders (AC, tABS) and large alphabet (Huffman, range coding, ANS) are separate categories - the second one has additional cost of tables: memory cost and time for preparation, which should be listed separately.
    Last edited by Jarek; 5th March 2014 at 20:55.

  15. #14
    Member
    Join Date
    Mar 2013
    Location
    Worldwide
    Posts
    486
    Thanks
    52
    Thanked 182 Times in 133 Posts
    Quote Originally Posted by Jarek View Post
    dnd, comparing ANS with header for each block, with adaptive AC without headers is not fair (I think you use FastAC also with headers).
    I have added a column with compressed size without header when required (xANS and TurboHF) for your info. The timing for storing/reading the header are negligible.

  16. Thanks:

    Jarek (5th March 2014)

  17. #15
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    702
    Thanks
    217
    Thanked 217 Times in 134 Posts
    Thank you, but still it is not a benchmark of entropy coders, only of simple data compressors based on them - if you really want to compare entropy coders, they cannot differ by the most important parameter: probability distribution provided to them.
    Here I have a feeling that each of them uses a different probability distribution, especially the first four, for which we can see a big jump thanks to being bit-wise.
    E.g. your byte TurboAC is without header (adaptive), while TurboANS has header ... I don't know if 0.5% difference between them is thanks to using better probability distribution or inaccuracy of ANS ...
    Last edited by Jarek; 5th March 2014 at 22:43.

  18. #16
    Member
    Join Date
    Mar 2013
    Location
    Worldwide
    Posts
    486
    Thanks
    52
    Thanked 182 Times in 133 Posts
    I have included fpaqc.
    The distribution in enwik9 is not uniform. The adaptive coders are model based and have better compression than block based coders like ANS.

  19. #17
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Jarek --

    The compression rate here is meaningless, because pretty much all they're doing is squeezing the alphabet down to what's used in text. Huffman looks a lot better with less skew in the alphabet, I.e. less compressible data. It makes no sense to directly entropy compress enwik9, and the differences between coders are probably largely overhead.

    dnd -- you should test skewed data as well. Skew in the data exposes different code paths and can affect run times. Plus, it's more realistic.

  20. #18
    Member
    Join Date
    Dec 2011
    Location
    Cambridge, UK
    Posts
    467
    Thanks
    149
    Thanked 160 Times in 108 Posts
    I assume rANS_Ryg is the 32-bit version rather than rANS64. The 64 bit one I would expect to be very similar to Charles Bloom's version, looking at the code. (I couldn't benchmark both myself alas.) I'm also interested in how you are encoding and storing the header data. How this is done is obviously quite key to speed and size. It's built in to FSE as it is a full compressor, but not to ryg_rANS as it's purely the entropy encoding step. This means you're either not comparing like with like, or you're comparing unrevealed and closed source parts which makes it hard to judge.

    How Quasi-adaptive is FastAC? It seems to be a quick encoder, but a bit poor on decoding (mostly likely due to divisions). Much quicker than TurboAC byte infact. That appears to imply that the fully adaptive part of coding is the major bottleneck there, which suprises me.

    Also, how many runs did you do of each? The speed is high enough that you may need multiple runs to settle the CPU scaling down so it doesn't play a significant factor in the benchmarking.

  21. #19
    Member
    Join Date
    Mar 2013
    Location
    Worldwide
    Posts
    486
    Thanks
    52
    Thanked 182 Times in 133 Posts
    Quote Originally Posted by JamesB View Post
    I assume rANS_Ryg is the 32-bit version rather than rANS64.
    Yes, but i am including Rygs SIMD version. EDIT:I have included the 64 bits version now.
    I'm also interested in how you are encoding and storing the header data.
    Only FSE is providing its own header functions. To complete the tests i've included the same header functions as TurboANS for Ryg and CBloom ANSs. There is a column without header size included.
    The timing for storing/reading the header are negligible. The header functions are gamma coding (bitio) with zero RLE. I've posted an advanced version in http://encode.su/threads/1883-Reduci...ll=1#post36916
    How Quasi-adaptive is FastAC?
    well you can look at the source code, it is block adaptive with periodic rescale (i think ~2k). TurboAC is updating the model (like subotin) after each symbol encoding/decoding.

    Last edited by dnd; 7th March 2014 at 00:05.

  22. #20
    Member
    Join Date
    Mar 2013
    Location
    Worldwide
    Posts
    486
    Thanks
    52
    Thanked 182 Times in 133 Posts
    New enwik9bwt (skewed distribution) benchmarks!
    Last edited by dnd; 9th March 2014 at 01:52.

  23. #21
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts
    Quote Originally Posted by dnd View Post
    New enwik9bwt (skewed distribution) benchmarks!
    how did you create enwik9bwt ??

    I ask this since more than one way to get a bwt and they lead to different files.

    Secondly if your testing simple entropy encoders they should lead to almost the
    same compression if your using a pure simple entropy encoder. If using a simple
    one that only handles a small state space you get the activity your chart shows for
    the given file when BWTed or BWTSed since these are almost identical transforms.
    If you used a BWT and suppressed the required index field most would do slightly
    better with it than BWTS.

    If you bothered to test enwik9 after you did a UNBWTS you would see far worse
    compression even though a simple true entropy compressor would compress the files
    about the same.
    Last edited by biject.bwts; 9th March 2014 at 20:22. Reason: added to question

  24. #22

  25. #23
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    702
    Thanks
    217
    Thanked 217 Times in 134 Posts
    dnd, your "TurboAC byte" is probably a range coder used in some adaptive setting - why don't you add for comparison the same adaptive setting for "TurboANS" instead of headers? (sure a bit slower, but with this 2% gain).
    Also your "TurboANS" in binary setting should easily leave your "TurboAC bit" far behind ...

    Could you maybe provide some binaries for TurboANS (SIMD tANS?) to confirm speed you claim?

    ps. 64 bit rANS should practically touches the Shannon limit - again, you don't compare just entropy coders here, but rather estimators, header types etc ...
    Last edited by Jarek; 10th March 2014 at 00:05.

  26. #24
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts
    Quote Originally Posted by Jarek View Post
    dnd, your "TurboAC byte" is probably a range coder used in some adaptive setting - why don't you add for comparison the same adaptive setting for "TurboANS" instead of headers? (sure a bit slower, but with this 2% gain).
    Also your "TurboANS" in binary setting should easily leave your "TurboAC bit" far behind ...

    Could you maybe provide some binaries for TurboANS (SIMD tANS?) to confirm speed you claim?

    ps. 64 bit rANS should practically touches the Shannon limit - again, you don't compare just entropy coders here, but rather estimators, header types etc ...
    Pray tell what do you think the Shannon limit is for this file. Shouldn't the BWT version and non BWT version compress the same? Yes 64 bit math in a pure simple data compressor should be able to get close however most don't really implement
    a real 64 bit coder since it does not compress standard test as well as a lower precision so called simple entropy compressors.

  27. #25
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts
    Look for simple files basic entropy is the negative summation of log base 2 of the probabilities of occurrence.

    Suppose one happened to have a file where each symbol random then -log of 1/256 is 8 the symbols happen to be
    8 bits long the the entropy of the file is the length of each file in the case for a long random file since probability
    of occurrence is 1/256 .

    If one knows how often a symbol occurs the more often the smaller amount of bits needed to represent it.
    if the the symbol occurs rarely the the number of bits to encode it become greater then 8

    This entropy is based only on the number of occurrences of each symbol. A good entropy compressor will
    compress roughly the same regardless of order of occurrences since based only on number of symbols.

    Could someone not me, for who trusts me calculate the length of enwik9 if reduced to its basic entropy length

  28. #26
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    702
    Thanks
    217
    Thanked 217 Times in 134 Posts
    biject, I don't know what do you mean by entropy of a file (Kolmogorov complexity? Rissanen bound?) ... by saying that " 64bit rANS nearly touches Shannon limit", I've meant that its inaccuracy is negligible (\deltaH approximately (2^32)^2 times lower than for 32 bit).
    But it assumes knowing the real probability distribution - my main objection to this "benchmark of entropy coders" is that they are fed with different probability distributions - so it is rather test of estimators and header handling.

  29. #27
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,475
    Thanks
    26
    Thanked 121 Times in 95 Posts
    All of the encoders are more or less online and on enwik9 they score similarly (regarding size) so I think speed comparison is roughly valid, ie the order (on speed) of compressors shouldn't change much with other probability distributions.

    I don't think one could do a fair comparison as different coders have different tradeoffs - some are good for instant updates (eg the bitwise ones) and other have different optimal rescaling periods. Too often rescaling can cause waste of space on headers and waste of time on recalculating Huffman codes or FSE tables. Too rare recomputation of codes/ tables can cause loss of compression ratio as the coders adapts slower. Fixed probabilities over a 1 GB block isn't a realistic setting so that also wouldn't be very informative, unless we can assure the adaptation cost in real life scenarios is negligible.

  30. #28
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    702
    Thanks
    217
    Thanked 217 Times in 134 Posts
    Piotrek, entropy coders are simple mechanisms to translate symbol sequence into bit sequence. Performance of pure entropy coder can be easily quantified in three aspects: speed, accuracy and memory need (tables). Accuracy means working close to provided them probabilities.
    Practically all practical performance aspects can be found from benchmarking:
    - decoding/encoding processing speed - pure hot loops for decoding/encoding,
    - preparation time - for large alphabets only - time required to prepare tables e.g. for 256 size alphabet,
    - accuracy - find deltaH for various given probability distributions: used number of bits minus sum_i lg(1/p_i) over the whole sequence (artificially generated). There can be used probability distributions from different families, like binomial, Gaussian, two-directional geometric distribution.
    - memory need for tables (trade-off with accuracy and preparation time),

    Finally I see three separate categories for entropy coding benchmarks:
    - binary (AC, tABS) - test speed and accuracy. For tABS accuracy depends on used tables, but they can be stored in coder - they don't affect speed (trade-off accuracy vs memory usage),
    - speed of processing large alphabet (Huffman, Range coding, tANS, rANS) - they are given coding tables and we benchmark speed only,
    - table preparation for large alphabet - benchmark of trade-off between generation time, memory need and accuracy of preparation only (e.g. tANS symbol spread). It can be divided into different categories depending on memory use.

    In opposite to benchmarks from the first post here, such more throughout tests would allow to estimate behavior in given situation.

  31. #29
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    865
    Thanks
    463
    Thanked 260 Times in 107 Posts
    I would feel better if such tests were performed by a third-party, using verifiable tools.

  32. #30
    Member
    Join Date
    Dec 2011
    Location
    Cambridge, UK
    Posts
    467
    Thanks
    149
    Thanked 160 Times in 108 Posts
    Quote Originally Posted by biject.bwts View Post
    This entropy is based only on the number of occurrences of each symbol. A good entropy compressor will
    compress roughly the same regardless of order of occurrences since based only on number of symbols.

    Could someone not me, for who trusts me calculate the length of enwik9 if reduced to its basic entropy length
    In that purest form:

    Code:
    jkb@seq3a[work/compress...] ./entropy16 < ~/scratch/data/enwik9
    len = 1000000000 bytes, entropy16 = 565716294.871233, 4.525730 bits per byte
    len = 1000000000 bytes, entropy8  = 644561309.084184, 5.156490 bits per byte
    
    jkb@seq3a[work/compress...] ./smoke -b1g ~/scratch/data/enwik9
    /nfs/users/nfs_j/jkb/scratch/data/enwik9 | min % | avg % | max % | incompressible 1GB blocks
    -----------------------------------------|------:|------:|------:|----------------------------
                                Byte entropy | 64.46 | 64.46 | 64.46 | 0 of 1
                                Word entropy | 56.57 | 56.57 | 56.57 | 0 of 1
                        Order-1 byte entropy | 48.69 | 48.69 | 48.69 | 0 of 1
                          DWord hash entropy | 85.56 | 85.56 | 85.56 | 0 of 1
                              DWord coverage |  0.34 |  0.34 |  0.34 | 0 of 1
                       2-pass DWord coverage |  0.02 |  0.02 |  0.02 | 0 of 1
                       2-pass QWord coverage |  1.62 |  1.62 |  1.62 | 0 of 1
    Ie we both agree on 8-bit and 16-bit entropy.

    However the entropy of the entire file may not be the same as the entropy of the file when taken in smaller chunks, say 1Mb at a time:

    Code:
    ./smoke -b1m ~/scratch/data/enwik9
    /nfs/users/nfs_j/jkb/scratch/data/enwik9 | min % | avg % | max % | incompressible 1MB blocks
    -----------------------------------------|------:|------:|------:|----------------------------
                                Byte entropy | 61.59 | 64.10 | 68.07 | 0 of 954
                                Word entropy | 52.17 | 55.71 | 58.05 | 0 of 954
                        Order-1 byte entropy | 41.06 | 47.33 | 48.79 | 0 of 954
                          DWord hash entropy | 67.84 | 82.71 | 85.49 | 0 of 954
                              DWord coverage |  2.54 |  9.75 | 13.44 | 0 of 954
                       2-pass DWord coverage |  2.34 |  8.96 | 11.32 | 0 of 954
                       2-pass QWord coverage |  8.03 | 45.64 | 56.70 | 0 of 954

    DND said he used 14k, which is better for adapting to local changes to entropy, but puts a significant strain on the header encoding efficiency (both speed and size):

    Code:
    ./smoke -b14k ~/scratch/data/enwik9
    /nfs/users/nfs_j/jkb/scratch/data/enwik9 | min % | avg % | max % | incompressible 14KB blocks
    -----------------------------------------|------:|------:|------:|----------------------------
                                Byte entropy |  0.00 | 62.92 | 75.50 | 0 of 69755
                                Word entropy |  0.00 | 52.96 | 58.75 | 0 of 69755
                        Order-1 byte entropy |  0.00 | 43.01 | 47.70 | 0 of 69755
                          DWord hash entropy |  0.00 | 71.63 | 77.55 | 0 of 69755
                              DWord coverage |  0.00 | 42.44 | 64.81 | 0 of 69755
                       2-pass DWord coverage |  0.01 | 40.33 | 60.82 | 0 of 69755
                       2-pass QWord coverage |  0.01 | 70.61 | 99.27 | 5 of 69755:  57372, 57375, 57377, 57379, 57381
    Hence even this purest definition of entropy has nuances. The main entropy encoders listed fit within the expected ballpark though.

  33. Thanks:

    Jarek (10th March 2014)

Page 1 of 7 123 ... LastLast

Similar Threads

  1. Archiver benchmarking framework
    By Bulat Ziganshin in forum Data Compression
    Replies: 11
    Last Post: 8th February 2013, 20:46
  2. Compression benchmarking: 64 bit images and 24 bit codecs
    By m^2 in forum The Off-Topic Lounge
    Replies: 6
    Last Post: 30th November 2011, 17:01
  3. Lossless image coders
    By Madgeniy in forum Data Compression
    Replies: 26
    Last Post: 11th July 2011, 10:06
  4. paq8f w/ .DXEs (DJGPPv2, DOS, benchmarking)
    By Rugxulo in forum Data Compression
    Replies: 4
    Last Post: 2nd February 2010, 15:32
  5. Comparison of the recent CM demo coders
    By Shelwien in forum Data Compression
    Replies: 38
    Last Post: 13th June 2008, 14:21

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •