NEWEST BENCHMARKS in https://sites.google.com/site/powturbo/entropy-coder
NEWEST BENCHMARKS in https://sites.google.com/site/powturbo/entropy-coder
ne0n (7th March 2014)
Thanks, there was really missing some comparison.
It's strange that Huffman gets the same rates here, as ANS additionally uses fractional bits.
You can also add tANS for binary alphabet (tABS) to compare with binary AC for varying probabilities (like Matt's fpaqb but with optimizations) - for this TurboANS it would be ~427/8~53.4 MB/s (a bit more as you generate coding tables only once).
Last edited by Jarek; 5th March 2014 at 16:20.
Is the benchmark program open sourced ?
[Edit] It is apparently not open sourced nor verifiable.
As a consequence, I'm not keen to see FSE described using such bogus numbers, which are far off the mark.
As published on https://github.com/Cyan4973/FiniteStateEntropy,
the speed of single-threaded FSE on a Core i5 @ 2.96GHz is :
entropy only : compression 430 MB/s , decompression 520 MS/s
full code (including header & table construction) : compression 300 MB/s, decompression 420 MB/s
and since the code & data fits into L1 cache, it ramps up pretty weel with clock speed.
Therefore, an I7-2600k at 4.5GHz should experience better speed than that, not worse.
People willing to have a better understanding of its performance should better test it independently.
After all, the code is freely available, open sourced, and there is a benchmark module included.
Last edited by Cyan; 25th July 2015 at 15:01.
This is valid for this symbol distribution. Additionally the headers for huffman are in general smaller.
These are in memory benchmark, fpab is file based, it must be modified. You can't deduce the speed simply by dividing by 8.
@Cyan: the benchmark is actually not open source. You can modify fsbench for this purpose.
Huffman's probability quantization is really bad- see fig. 3 in http://arxiv.org/pdf/1008.3597v1.pdf ... and accurate has still room for improvements (thread).
It means that for accurate entropy coder you can get the same rate using a few times smaller header than Huffman, or get better compression rate for the same header size.
... and you should have exactly the same header issue for arithmetic coding you show...
It is using tANS for binary alphabet ... the difference is that you switch between multiple tables, read single bits not bytes ... but you generate coding tables only once.These are in memory benchmark, fpab is file based, it must be modified. You can't deduce the speed simply by dividing by 8.
In such case, you should compare with block-adaptive ANS, maybe provide header for the first block only (~141B in FSE) ... but it makes such comparison more blurry.
If you want to make a really clear and fair comparison of entropy coders only, you should provide them with the same probability distribution and measure pure data processing speed and rate, comparing with the optimum rate: Shannon entropy.
Preparation should be treated separately - you can provide time for Huffman or ANS table preparation as a separate parameter in benchmark.
The goal of this benchmark is to compare different and the latest implementations primarily for speed comparisons under common distributions.
I don't want to speculate about what is possible, before implementing.
For ex. this benchmark demonstrate for the first time that binary arithmetic coding (8x per byte) can be faster than bytewise arithmetic coding (1x per byte).
Please include PPMd var.J (-o2 ... -o4)
You also have to realize that at least for the adaptive arithmetic there will be a difference in compression for enwik9 based on the use of Kt or Laplace update when doing adaptive.
Also if when comparing entropy encoders they may not really be real entropy encoders. Try compressing various permutations
of enwik9 if the arithmetic are basic entropy coders they should compress each permutation to the same length give or take a byte.
I tested several so called basic entropy coders years ago and found most designed to look good for files like enwik9 but they failed to compress to same amount when permutations of enwik9 used in fact they got far worse compressions meaning they are not true basic
entropy compressors
arb255 is a true bijective arithmetic compressor on matts benchmark it gets 644,561,595 I don't remember if the KT or Laplace update rate used for the cells that would make a difference in the compressed size.
UPDATE I just ran current arb255 it get 644,561,590 it is smaller used KT for the update
but all permutations of enwik9 will compress to this same length give or take ONE byte.
However unless you test permutations of the input files you can't be sure the code actually is a real entropy coder.
an easy way to generate some permutations of the input test file is to do a few BWTS or UNBWTS on the test file then run your tests.
Last edited by biject.bwts; 5th March 2014 at 19:58. Reason: value for current arb255
The reason that ARB255 looks worse in size is that its most likely the only pure entropy coder mentioned so far. The other coders will do worse size wise if you test with permutations. The reason is that by the use of limited high and low bit lengths in the state registers they loose information about the true state so there are really adjusting the compression by using local effects that are different when the file undergoes certain permutations. I realize that for many applications one really does not want a true entropy coder but when one does you have to realize they are not true entropy coders. Another thing arb255 though a true coder is really just a test bed its very very slow.
dnd, comparing ANS with header for each block, with adaptive AC without headers is not fair (I think you use FastAC also with headers).
If you want to "compare pure entropy coders" do it by giving them the same distribution - and you can compare with the maximal available rate: Shannon entropy.
Now you are comparing simple compressors based on them.
Also binary coders (AC, tABS) and large alphabet (Huffman, range coding, ANS) are separate categories - the second one has additional cost of tables: memory cost and time for preparation, which should be listed separately.
Last edited by Jarek; 5th March 2014 at 20:55.
Jarek (5th March 2014)
Thank you, but still it is not a benchmark of entropy coders, only of simple data compressors based on them - if you really want to compare entropy coders, they cannot differ by the most important parameter: probability distribution provided to them.
Here I have a feeling that each of them uses a different probability distribution, especially the first four, for which we can see a big jump thanks to being bit-wise.
E.g. your byte TurboAC is without header (adaptive), while TurboANS has header ... I don't know if 0.5% difference between them is thanks to using better probability distribution or inaccuracy of ANS ...
Last edited by Jarek; 5th March 2014 at 22:43.
Jarek --
The compression rate here is meaningless, because pretty much all they're doing is squeezing the alphabet down to what's used in text. Huffman looks a lot better with less skew in the alphabet, I.e. less compressible data. It makes no sense to directly entropy compress enwik9, and the differences between coders are probably largely overhead.
dnd -- you should test skewed data as well. Skew in the data exposes different code paths and can affect run times. Plus, it's more realistic.
I assume rANS_Ryg is the 32-bit version rather than rANS64. The 64 bit one I would expect to be very similar to Charles Bloom's version, looking at the code. (I couldn't benchmark both myself alas.) I'm also interested in how you are encoding and storing the header data. How this is done is obviously quite key to speed and size. It's built in to FSE as it is a full compressor, but not to ryg_rANS as it's purely the entropy encoding step. This means you're either not comparing like with like, or you're comparing unrevealed and closed source parts which makes it hard to judge.
How Quasi-adaptive is FastAC? It seems to be a quick encoder, but a bit poor on decoding (mostly likely due to divisions). Much quicker than TurboAC byte infact. That appears to imply that the fully adaptive part of coding is the major bottleneck there, which suprises me.
Also, how many runs did you do of each? The speed is high enough that you may need multiple runs to settle the CPU scaling down so it doesn't play a significant factor in the benchmarking.
Yes, but i am including Rygs SIMD version. EDIT:I have included the 64 bits version now.
Only FSE is providing its own header functions. To complete the tests i've included the same header functions as TurboANS for Ryg and CBloom ANSs. There is a column without header size included.I'm also interested in how you are encoding and storing the header data.
The timing for storing/reading the header are negligible. The header functions are gamma coding (bitio) with zero RLE. I've posted an advanced version in http://encode.su/threads/1883-Reduci...ll=1#post36916
well you can look at the source code, it is block adaptive with periodic rescale (i think ~2k). TurboAC is updating the model (like subotin) after each symbol encoding/decoding.How Quasi-adaptive is FastAC?
how did you create enwik9bwt ??
I ask this since more than one way to get a bwt and they lead to different files.
Secondly if your testing simple entropy encoders they should lead to almost the
same compression if your using a pure simple entropy encoder. If using a simple
one that only handles a small state space you get the activity your chart shows for
the given file when BWTed or BWTSed since these are almost identical transforms.
If you used a BWT and suppressed the required index field most would do slightly
better with it than BWTS.
If you bothered to test enwik9 after you did a UNBWTS you would see far worse
compression even though a simple true entropy compressor would compress the files
about the same.
Last edited by biject.bwts; 9th March 2014 at 20:22. Reason: added to question
bwt demo in https://code.google.com/p/libdivsufsort/
dnd, your "TurboAC byte" is probably a range coder used in some adaptive setting - why don't you add for comparison the same adaptive setting for "TurboANS" instead of headers? (sure a bit slower, but with this 2% gain).
Also your "TurboANS" in binary setting should easily leave your "TurboAC bit" far behind ...
Could you maybe provide some binaries for TurboANS (SIMD tANS?) to confirm speed you claim?
ps. 64 bit rANS should practically touches the Shannon limit - again, you don't compare just entropy coders here, but rather estimators, header types etc ...
Last edited by Jarek; 10th March 2014 at 00:05.
Pray tell what do you think the Shannon limit is for this file. Shouldn't the BWT version and non BWT version compress the same? Yes 64 bit math in a pure simple data compressor should be able to get close however most don't really implement
a real 64 bit coder since it does not compress standard test as well as a lower precision so called simple entropy compressors.
Look for simple files basic entropy is the negative summation of log base 2 of the probabilities of occurrence.
Suppose one happened to have a file where each symbol random then -log of 1/256 is 8 the symbols happen to be
8 bits long the the entropy of the file is the length of each file in the case for a long random file since probability
of occurrence is 1/256 .
If one knows how often a symbol occurs the more often the smaller amount of bits needed to represent it.
if the the symbol occurs rarely the the number of bits to encode it become greater then 8
This entropy is based only on the number of occurrences of each symbol. A good entropy compressor will
compress roughly the same regardless of order of occurrences since based only on number of symbols.
Could someone not me, for who trusts me calculate the length of enwik9 if reduced to its basic entropy length
biject, I don't know what do you mean by entropy of a file (Kolmogorov complexity? Rissanen bound?) ... by saying that " 64bit rANS nearly touches Shannon limit", I've meant that its inaccuracy is negligible (\deltaH approximately (2^32)^2 times lower than for 32 bit).
But it assumes knowing the real probability distribution - my main objection to this "benchmark of entropy coders" is that they are fed with different probability distributions - so it is rather test of estimators and header handling.
All of the encoders are more or less online and on enwik9 they score similarly (regarding size) so I think speed comparison is roughly valid, ie the order (on speed) of compressors shouldn't change much with other probability distributions.
I don't think one could do a fair comparison as different coders have different tradeoffs - some are good for instant updates (eg the bitwise ones) and other have different optimal rescaling periods. Too often rescaling can cause waste of space on headers and waste of time on recalculating Huffman codes or FSE tables. Too rare recomputation of codes/ tables can cause loss of compression ratio as the coders adapts slower. Fixed probabilities over a 1 GB block isn't a realistic setting so that also wouldn't be very informative, unless we can assure the adaptation cost in real life scenarios is negligible.
Piotrek, entropy coders are simple mechanisms to translate symbol sequence into bit sequence. Performance of pure entropy coder can be easily quantified in three aspects: speed, accuracy and memory need (tables). Accuracy means working close to provided them probabilities.
Practically all practical performance aspects can be found from benchmarking:
- decoding/encoding processing speed - pure hot loops for decoding/encoding,
- preparation time - for large alphabets only - time required to prepare tables e.g. for 256 size alphabet,
- accuracy - find deltaH for various given probability distributions: used number of bits minus sum_i lg(1/p_i) over the whole sequence (artificially generated). There can be used probability distributions from different families, like binomial, Gaussian, two-directional geometric distribution.
- memory need for tables (trade-off with accuracy and preparation time),
Finally I see three separate categories for entropy coding benchmarks:
- binary (AC, tABS) - test speed and accuracy. For tABS accuracy depends on used tables, but they can be stored in coder - they don't affect speed (trade-off accuracy vs memory usage),
- speed of processing large alphabet (Huffman, Range coding, tANS, rANS) - they are given coding tables and we benchmark speed only,
- table preparation for large alphabet - benchmark of trade-off between generation time, memory need and accuracy of preparation only (e.g. tANS symbol spread). It can be divided into different categories depending on memory use.
In opposite to benchmarks from the first post here, such more throughout tests would allow to estimate behavior in given situation.
I would feel better if such tests were performed by a third-party, using verifiable tools.
In that purest form:
Ie we both agree on 8-bit and 16-bit entropy.Code:jkb@seq3a[work/compress...] ./entropy16 < ~/scratch/data/enwik9 len = 1000000000 bytes, entropy16 = 565716294.871233, 4.525730 bits per byte len = 1000000000 bytes, entropy8 = 644561309.084184, 5.156490 bits per byte jkb@seq3a[work/compress...] ./smoke -b1g ~/scratch/data/enwik9 /nfs/users/nfs_j/jkb/scratch/data/enwik9 | min % | avg % | max % | incompressible 1GB blocks -----------------------------------------|------:|------:|------:|---------------------------- Byte entropy | 64.46 | 64.46 | 64.46 | 0 of 1 Word entropy | 56.57 | 56.57 | 56.57 | 0 of 1 Order-1 byte entropy | 48.69 | 48.69 | 48.69 | 0 of 1 DWord hash entropy | 85.56 | 85.56 | 85.56 | 0 of 1 DWord coverage | 0.34 | 0.34 | 0.34 | 0 of 1 2-pass DWord coverage | 0.02 | 0.02 | 0.02 | 0 of 1 2-pass QWord coverage | 1.62 | 1.62 | 1.62 | 0 of 1
However the entropy of the entire file may not be the same as the entropy of the file when taken in smaller chunks, say 1Mb at a time:
Code:./smoke -b1m ~/scratch/data/enwik9 /nfs/users/nfs_j/jkb/scratch/data/enwik9 | min % | avg % | max % | incompressible 1MB blocks -----------------------------------------|------:|------:|------:|---------------------------- Byte entropy | 61.59 | 64.10 | 68.07 | 0 of 954 Word entropy | 52.17 | 55.71 | 58.05 | 0 of 954 Order-1 byte entropy | 41.06 | 47.33 | 48.79 | 0 of 954 DWord hash entropy | 67.84 | 82.71 | 85.49 | 0 of 954 DWord coverage | 2.54 | 9.75 | 13.44 | 0 of 954 2-pass DWord coverage | 2.34 | 8.96 | 11.32 | 0 of 954 2-pass QWord coverage | 8.03 | 45.64 | 56.70 | 0 of 954
DND said he used 14k, which is better for adapting to local changes to entropy, but puts a significant strain on the header encoding efficiency (both speed and size):
Hence even this purest definition of entropy has nuances. The main entropy encoders listed fit within the expected ballpark though.Code:./smoke -b14k ~/scratch/data/enwik9 /nfs/users/nfs_j/jkb/scratch/data/enwik9 | min % | avg % | max % | incompressible 14KB blocks -----------------------------------------|------:|------:|------:|---------------------------- Byte entropy | 0.00 | 62.92 | 75.50 | 0 of 69755 Word entropy | 0.00 | 52.96 | 58.75 | 0 of 69755 Order-1 byte entropy | 0.00 | 43.01 | 47.70 | 0 of 69755 DWord hash entropy | 0.00 | 71.63 | 77.55 | 0 of 69755 DWord coverage | 0.00 | 42.44 | 64.81 | 0 of 69755 2-pass DWord coverage | 0.01 | 40.33 | 60.82 | 0 of 69755 2-pass QWord coverage | 0.01 | 70.61 | 99.27 | 5 of 69755: 57372, 57375, 57377, 57379, 57381
Jarek (10th March 2014)