Good work m^2. The bug is fixed in "dev". Thanks.
Good work m^2. The bug is fixed in "dev". Thanks.
Your gratitude gave me a lot of positive energy. Thank you for that.
LZ5 v1.3.3 beta can be downloaded at:
https://github.com/inikep/lz5/archive/dev.zip
Changes:
- added: new levels from 11 to 16 (maximum compression ratio)
- added: a new parser: LZ5HC_optimal_price
- fixed: buffer-overflow during decompression (thanks to m^2)
Please help me with fuzzing. Here is a quick comparison:
Code:Compressor name Compression Decompress. Compr. size Ratio crush 1.0 level 0 14 MB/s 195 MB/s 50419812 48.08 crush 1.0 level 1 4.09 MB/s 211 MB/s 48195021 45.96 crush 1.0 level 2 0.55 MB/s 214 MB/s 47105187 44.92 lz5hc r132 level 10 12 MB/s 670 MB/s 48109030 45.88 lz5hc r132 level 11 6.60 MB/s 592 MB/s 47639520 45.43 lz5hc r132 level 12 3.22 MB/s 670 MB/s 47461368 45.26 lz5hc r132 level 13 1.09 MB/s 642 MB/s 47382059 45.19 lz5hc v1.3.3 level 10 12 MB/s 677 MB/s 48109030 45.88 lz5hc v1.3.3 level 11 8.76 MB/s 692 MB/s 47438817 45.24 lz5hc v1.3.3 level 12 7.02 MB/s 651 MB/s 47063261 44.88 lz5hc v1.3.3 level 13 4.75 MB/s 645 MB/s 46718698 44.55 lz5hc v1.3.3 level 14 3.84 MB/s 671 MB/s 46484969 44.33 lz5hc v1.3.3 level 15 1.93 MB/s 712 MB/s 46227364 44.09 lz5hc v1.3.3 level 16 0.80 MB/s 709 MB/s 46125742 43.99
Last edited by inikep; 18th December 2015 at 21:06.
m^2 (19th December 2015)
Are there any changes in decompression?
I have not finished fuzzing it yet.
There is a small change in the decompression.
However if decompression in r132 works correctly it will work correctly also for v1.3.3.
Note: line numers may differ in fullbench.c as I needed to modify it.Code:katmacadapc% ./fullbench-notrap data_c/000-000 *** LZ5 speed analyzer v1.3.3 64-bits, by Yann Collet (Dec 28 2015) *** Compression functions : ../lib/lz5.c:871:25: runtime error: null pointer passed as argument 2, which is declared to never be null /usr/include/string.h:50:14: note: nonnull attribute specified here SUMMARY: AddressSanitizer: undefined-behavior ../lib/lz5.c:871:25 in ERROR ! LZ5_saveDict() = 0 !! fullbench-notrap: fullbench.c:773: int fullSpeedBench(char **, int): Assertion `0' failed. zsh: abort ./fullbench-notrap data_c/000-000
inikep (4th January 2016)
Thanks. I'll take care of it when I get back from vacation.
@m^3: Please check https://github.com/inikep/lz5/tree/dev if bugs inherited from LZ4 are removed.
Btw, I've gave -dev version of LZ5 a try. On compressing Linux kernel it beats crush any day. Yet, on other data it may or may not beat it, really depends on nature of data. In some cases crush happens to be just too strong and wins. However, overall it quite an improvement. This world clearly gone crazy: byte aligned LZ beating bit aligned, and another bit-aligned LZ shows to LZ+entropy things who is real king of the LZ compression.
NB: I've found table defining levels and attempted to "turbocharge" it a little bit on lvl 16, however, it seems I've got negligible gains.
What are "technical" limits on various members of
...while compressor still would generate valid output and (unmodified) decoder can still parse it?Code:U32 windowLog; /* largest match distance : impact decompression buffer size */ U32 contentLog; /* full search segment : larger == more compression, slower, more memory (useless for fast) */ U32 hashLog; /* dispatch table : larger == more memory, faster*/ U32 hashLog3; /* dispatch table : larger == more memory, faster*/ U32 searchNum; /* nb of searches : larger == more compression, slower*/ U32 searchLength; /* size of matches : larger == faster decompression */ U32 sufficientLength; /* used only by optimal parser: size of matches which is acceptable: larger == more compression, slower */ LZ5HC_strategy strategy;
E.g. could I increase dictionary? And what does it means "impact decompression buffer size"? I've thought this data format "requires no memory for decompression". Have I seriously missed something?
The limits:
In my experiments windowLog=22 gives the best ratio (on average). It means that the compressor finds matches with offset up to 1<<22. Therefore during decompression LZ5 requires access up to 4 MB (22-bits) recently decoded bytes. LZ4 uses only 64 KB dictionary (16-bits).Code:{ 10, 10, 10, 0, 0, 4, 0, LZ5HC_fast }, // min values { 24, 24, 28, 24, 1<<24, 7, 1<<24, LZ5HC_optimal_price }, // max values
Last edited by inikep; 4th January 2016 at 22:36.
xcrh (4th January 2016)
thanks for definitive answer!
It seems it depends on what you compress (and how large & redundant it happens to be). I've played with crush a bit some time before and got idea too big dictionary could be bad. Though I guess it depends on amount of data and their nature.In my experiments windowLog=22 gives the best ratio (on average).
It can look like this:
...so lz5hc lvl 6 was the best. Lower and higher levels were worse on this small file. I've used LZ5 from your -dev branch (I've put it to lzbench myself).Code:lz5hc r132 level 6 6.22 MB/s 244 MB/s 97972 42.62 lz5hc r132 level 7 5.71 MB/s 242 MB/s 97992 42.62 lz5hc r132 level 9 2.16 MB/s 238 MB/s 98088 42.67 lz5hc r132 level 12 1.77 MB/s 228 MB/s 99986 43.49 lz5hc r132 level 16 1.56 MB/s 228 MB/s 99932 43.47
On other hand, increasing parameters and making dict larger makes sense in some cases.
E.g. ubuntu x86-64 Linux kernel, ~22Mb chunk of code&data which is redundant (large areas of zeros, etc):
Ok, "ultimate" setup is very slow (but still faster than ratio-inclined things like LZOMA) and also hogs memory (not surprising). Getting rid of ~120Kb isn't bad, granted decompression speed stays, but hardly epic. And of course it makes no effect on small files. Btw, crush 2 got outperformed in all cases, though maybe it is a good idea to try various dict sizes for crush to see what it can. I wonder how bad my settings are, if we assume I want to get rid a lot of zeros, etc but preferrably without waiting for a week.Code://LZ5HC 16, stock from -dev: lz5hc r132 level 16 0.37 MB/s 179 MB/s 7130425 31.27 //"supercharged" level 16, dictionary not touched, matches tweaked - { 22, 22, 23, 16, 8192, 3,8192,LZ5HC_optimal_price } lz5hc r132 level 16 0.03 MB/s 177 MB/s 7122521 31.24 //"ultimate" level 16 - also increased dicionary & hash tables - { 24, 24, 27, 24, 8192, 3,8192,LZ5HC_optimal_price }, lz5hc r132 level 16 0.03 MB/s 171 MB/s 7007389 30.74 //zlib & crush - for reference purposes: zlib 1.2.8 level 9 3.21 MB/s 85 MB/s 6781199 29.74 zlib 1.2.8 level 6 7.83 MB/s 85 MB/s 6793262 29.80 crush 1.0 level 2 0.25 MB/s 78 MB/s 7267494 31.88
Let's take a look on competitors in more or less the same league...
1) lz4: rocks in terms of decompression speed, but ratio is nowhere close.Code:lz4hc r131 level 16 1.31 MB/s 520 MB/s 8732628 38.30 lzo1z 2.09 level 999 2.53 MB/s 148 MB/s 7543941 33.09 lzo1x 2.09 level 999 2.49 MB/s 153 MB/s 7933488 34.80 lzo1y 2.09 level 999 2.51 MB/s 153 MB/s 7953888 34.89 lzo2a 2.09 level 999 6.72 MB/s 124 MB/s 8079096 35.44 lzo1f 2.09 level 999 5.57 MB/s 157 MB/s 8265099 36.25 lzo1b 2.09 level 999 3.38 MB/s 177 MB/s 8556961 37.53 lzo1c 2.09 level 999 5.92 MB/s 187 MB/s 8612912 37.78
2) LZO got outperformed quite a bit: LZ5 achieved better ratio AND decompression speed. LZO only gets comparable decompression speed if ratio is crap and if someone ok with such ratio, LZ4 would decompress much faster. Strangely, builtin stub uses lzo1x, while 1z is clearly better tradeoff.
Ok, I got it, it means you can't flush and discard these data. Not a big deal if one decompresses src block to dst block and keeps source and destination in memory, and in this sense, no extra memory needed. But if one does not needs these data in memory and want to flush it, its not an option and that's what this note really means. Thanks for explainationIt means that the compressor finds matches with offset up to 1<<22. Therefore during decompression LZ5 requires access up to 4 MB (22-bits) recently decoded bytes. LZ4 uses only 64 KB dictionary (16-bits).![]()
inikep (5th January 2016)
LZ5 v1.3.3 (final) can be downloaded at:
https://github.com/inikep/lz5/releases
Changes from r132:
- added: new levels from 11 to 18 (maximum compression ratio)
- added: a new parser: LZ5HC_optimal_price
- fixed: buffer-overflow during decompression (thanks to m^2)
Here is a quick comparison:
Code:Compressor name Compression Decompress. Compr. size Ratio crush 1.0 level 0 14 MB/s 195 MB/s 50419812 48.08 crush 1.0 level 1 4.09 MB/s 211 MB/s 48195021 45.96 crush 1.0 level 2 0.55 MB/s 214 MB/s 47105187 44.92 lz5hc r132 level 10 12 MB/s 670 MB/s 48109030 45.88 lz5hc r132 level 11 6.60 MB/s 592 MB/s 47639520 45.43 lz5hc r132 level 12 3.22 MB/s 670 MB/s 47461368 45.26 lz5hc r132 level 13 1.09 MB/s 642 MB/s 47382059 45.19 lz5hc v1.3.3 level 10 12 MB/s 677 MB/s 48109030 45.88 lz5hc v1.3.3 level 11 8.76 MB/s 692 MB/s 47438817 45.24 lz5hc v1.3.3 level 12 7.02 MB/s 651 MB/s 47063261 44.88 lz5hc v1.3.3 level 13 4.75 MB/s 645 MB/s 46718698 44.55 lz5hc v1.3.3 level 14 3.84 MB/s 671 MB/s 46484969 44.33 lz5hc v1.3.3 level 15 1.93 MB/s 712 MB/s 46227364 44.09 lz5hc v1.3.3 level 16 0.80 MB/s 709 MB/s 46125742 43.99 lz5hc v1.3.3 level 17 0.39 MB/s 679 MB/s 46050114 43.92 lz5hc v1.3.3 level 18 0.16 MB/s 541 MB/s 46008853 43.88
comp1 (5th January 2016),Stephan Busch (8th January 2016),xcrh (8th January 2016)
I tested lz5 on LTCB and Silesia.
http://mattmahoney.net/dc/text.html#3196
http://mattmahoney.net/dc/silesia.html
I've fallen behind on testing other compressors. If you're in a hurry, you can test yourself and send me results.
inikep (6th January 2016)
inikep, I have a request for you. When you find a bug inherited from LZ4 (IIRC there are 2 already), report them back to Yann.
I'll test your changes in the following days.
I reported bugs in LZ4 at https://github.com/Cyan4973/lz4/issues/178
Okay, thanks to new match finder in -dev, it seems I've got new "high-score" while compressing kernel using LZ5...
I've used custom lvl 15, { 24, 24, 27, 24, 1<<11, 4, 1<<11, 1, LZ5HC_optimal_price_bt }, // level 15 - also happens to be "faster" (if we compare to previous similar thing)Code:Compressor name Compression Decompress. Compr. size Ratio memcpy 32 MB/s 61 MB/s 22799232 100.00 lz5hc r132 level 15 0.59 MB/s 176 MB/s 6978174 30.61 zlib 1.2.8 level 1 20 MB/s 86 MB/s 7268620 31.88 zlib 1.2.8 level 3 16 MB/s 89 MB/s 7062079 30.98 zlib 1.2.8 level 6 7.80 MB/s 85 MB/s 6793262 29.80 zlib 1.2.8 level 9 3.21 MB/s 86 MB/s 6781199 29.74
Some observations:
1) Setting match searching above 1<<10 gives quite little effect, and above 1<<11 there're very little gains, while killing compression speed a lot.
2) On this relatively large file, large window pays for itself, noticeably improving ratio.
3) On my laptop running all sorts of crap hashLog=28 probably ran out of memory, because I've got not-so-inspiring 100% ratio. After trying various hashLogs a bit, I've got idea it does not hurts speed much with my settings, etc and hashLog=27 is quite ok to try my "ultimate" level, it does not runs out of mem.
4) I've been curious if I can get below 7 000 000 bytes mark. Hmm, ok, now I can!
5) On some another file (uncompressed PC-generated graphics from some game, very highly compressible PNG with "stored" level, a bit above 2Mb in size) only my custom levels with increased window managed to beat Crush 2 :P. Crush did very well here. Speeds are exagerrated due to extreme ratios.
Here lvl 14 is also custom, { 23, 23, 23, 23, 1<<12, 4, 1<<12, 1, LZ5HC_optimal_price_bt }, // level 14 - smaller dict, but match finding is probably overaggressive and speed fell even below that of lvl 15 while it did a bit better and twice as fastCode:$ ./lzbench -elz5hc,3,6,8,9,10,12,13,14,15/zlib,1,3,6,9/crush,2 tanktop-female.png lzbench 0.9.1 (64-bit Linux) Assembled by P.Skibinski Compressor name Compression Decompress. Compr. size Ratio memcpy 1830 MB/s 1830 MB/s 2216456 100.00 lz5hc r132 level 3 99 MB/s 1019 MB/s 51325 2.32 lz5hc r132 level 6 19 MB/s 1048 MB/s 40738 1.84 lz5hc r132 level 8 18 MB/s 1054 MB/s 40657 1.83 lz5hc r132 level 9 13 MB/s 1079 MB/s 39140 1.77 lz5hc r132 level 10 12 MB/s 1098 MB/s 38454 1.73 lz5hc r132 level 12 17 MB/s 1019 MB/s 40031 1.81 lz5hc r132 level 13 7.34 MB/s 1054 MB/s 40796 1.84 lz5hc r132 level 14 0.03 MB/s 1092 MB/s 36428 1.64 lz5hc r132 level 15 0.06 MB/s 1087 MB/s 36362 1.64 zlib 1.2.8 level 1 89 MB/s 298 MB/s 46346 2.09 zlib 1.2.8 level 3 87 MB/s 303 MB/s 44820 2.02 zlib 1.2.8 level 6 43 MB/s 185 MB/s 33968 1.53 zlib 1.2.8 level 9 9.85 MB/s 191 MB/s 32081 1.45 crush 1.0 level 2 1.31 MB/s 309 MB/s 36698 1.66.
Bonus:
...what about beating ... zlib 9!Compressing lzbench itself, using mentioned 14/15 with increased windows.
Zlib had disadvantage - small window size, while staticaly linked file is ~8Mb sized. On other hand, it had huffman on its side. Fancy thing also the fact decompression speed of LZ5 scored ~3x faster than zlib. That's what I call "nice shot" when it comes to compressing something and getting interesting result.Code:lz5hc r132 level 13 2.19 MB/s 263 MB/s 1978115 23.99 lz5hc r132 level 14 1.38 MB/s 265 MB/s 1959714 23.77 lz5hc r132 level 15 0.91 MB/s 265 MB/s 1959808 23.77 zlib 1.2.8 level 1 25 MB/s 101 MB/s 2192223 26.59 zlib 1.2.8 level 3 19 MB/s 105 MB/s 2107582 25.56 zlib 1.2.8 level 6 9.76 MB/s 98 MB/s 1982685 24.05 zlib 1.2.8 level 9 3.71 MB/s 98 MB/s 1970716 23.90 crush 1.0 level 2 0.42 MB/s 96 MB/s 1975334 23.96
Now LZ5 overall looks quite interesting! It beats highest LZO levels while being a bit faster to decompress. But compresion ratio could resemble UCL. While 2x-3x faster to decompress. Whatever, LZ5 seems to be interesting tradeoff when one somewhat cares about ratio but needs high decompression speed. Not as blazing fast as LZ4, but a bit above of LZO.
Last edited by xcrh; 8th January 2016 at 06:38.
inikep (8th January 2016)
m^3: did you finish fuzzing? I added your results to CompFuzz, and I'm wondering if I should mark LZ5 as "OK" in the results table.
No, I'm not done. I share my CPU between ZSTD and LZ5 (the latter is gets c.a. 80%) and both run for months already.
It's hard to predict how long will a session take, but I expect that I need another 1-5 weeks. Unless there are new commits that add more work.
Nevertheless I think you can mark it OK because:
* it has received a lot of fuzzing already and
* there are no known issues
I think that you can get even better results with content-aware compression (with a compressor designed especially for a given kind(s) of data). This is the future of lossless data compression because currently in general-purpose compression people are working mainly on speed improvements.
The latest dev compiles with many warnings:
One even looks like a bug ( lz5frame.c:898 )Code:afl-clang-fast 1.94b by <lszekeres@google.com> In file included from lz5.c:41: ./lz5common.h:184:30: warning: named variadic macros are a GNU extension [-Wvariadic-macros] #define LZ5HC_DEBUG(fmt, args...) ;//printf(fmt, ##args) ^ ./lz5common.h:185:33: warning: named variadic macros are a GNU extension [-Wvariadic-macros] #define LZ5_LOG_PARSER(fmt, args...) ;//printf(fmt, ##args) ^ ./lz5common.h:186:32: warning: named variadic macros are a GNU extension [-Wvariadic-macros] #define LZ5_LOG_PRICE(fmt, args...) ;//printf(fmt, ##args) ^ ./lz5common.h:187:33: warning: named variadic macros are a GNU extension [-Wvariadic-macros] #define LZ5_LOG_ENCODE(fmt, args...) ;//printf(fmt, ##args) ^ In file included from lz5.c:41: In file included from ./lz5common.h:140: ./mem.h:305:17: warning: unused function 'MEM_highbit' [-Wunused-function] static unsigned MEM_highbit(U32 val) ^ ./mem.h:420:13: warning: unused function 'MEM_wildcopy' [-Wunused-function] static void MEM_wildcopy(void* dst, const void* src, size_t length) ^ In file included from lz5.c:41: ./lz5common.h:150:15: warning: unused function 'LZ5HC_hash3Ptr' [-Wunused-function] static size_t LZ5HC_hash3Ptr(const void* ptr, U32 h) { return LZ5HC_hash3(MEM_read32(ptr), h); } ^ ./lz5common.h:168:15: warning: unused function 'LZ5HC_hashPtr' [-Wunused-function] static size_t LZ5HC_hashPtr(const void* p, U32 hBits, U32 mls) ^ afl-llvm-pass 1.94b by <lszekeres@google.com> [+] Instrumented 3840 locations (non-hardened mode, ratio 100%). 8 warnings generated. In file included from lz5hc.c:42: ./lz5common.h:184:30: warning: named variadic macros are a GNU extension [-Wvariadic-macros] #define LZ5HC_DEBUG(fmt, args...) ;//printf(fmt, ##args) ^ ./lz5common.h:185:33: warning: named variadic macros are a GNU extension [-Wvariadic-macros] #define LZ5_LOG_PARSER(fmt, args...) ;//printf(fmt, ##args) ^ ./lz5common.h:186:32: warning: named variadic macros are a GNU extension [-Wvariadic-macros] #define LZ5_LOG_PRICE(fmt, args...) ;//printf(fmt, ##args) ^ ./lz5common.h:187:33: warning: named variadic macros are a GNU extension [-Wvariadic-macros] #define LZ5_LOG_ENCODE(fmt, args...) ;//printf(fmt, ##args) ^ lz5hc.c:102:102: warning: unused parameter 'iLimit' [-Wunused-parameter] FORCE_INLINE void LZ5HC_BinTree_Insert (LZ5HC_Data_Structure* ctx, const BYTE* ip, const BYTE* const iLimit) ^ lz5hc.c:859:32: warning: cast from 'const char *' to 'unsigned char *' drops const qualifier [-Wcast-qual] ctx->inputBuffer = (BYTE*) source; ^ lz5hc.c:1288:32: warning: cast from 'const char *' to 'unsigned char *' drops const qualifier [-Wcast-qual] ctx->inputBuffer = (BYTE*) source; ^ lz5hc.c:1343:30: warning: cast from 'const unsigned char *' to 'unsigned char *' drops const qualifier [-Wcast-qual] best_pos = (uint8_t*)ip; ^ lz5hc.c:1345:26: warning: cast from 'const unsigned char *' to 'unsigned char *' drops const qualifier [-Wcast-qual] off0 = (uint8_t*)ip - ref; ^ lz5hc.c:1348:30: warning: cast from 'const unsigned char *' to 'unsigned char *' drops const qualifier [-Wcast-qual] for (pos = (uint8_t*)ip + ml; pos >= start2; pos--) ^ lz5hc.c:1433:32: warning: cast from 'const char *' to 'unsigned char *' drops const qualifier [-Wcast-qual] ctx->inputBuffer = (BYTE*) source; ^ lz5hc.c:1564:32: warning: cast from 'const char *' to 'unsigned char *' drops const qualifier [-Wcast-qual] ctx->inputBuffer = (BYTE*) source; ^ In file included from lz5hc.c:42: In file included from ./lz5common.h:140: ./mem.h:305:17: warning: unused function 'MEM_highbit' [-Wunused-function] static unsigned MEM_highbit(U32 val) ^ ./mem.h:420:13: warning: unused function 'MEM_wildcopy' [-Wunused-function] static void MEM_wildcopy(void* dst, const void* src, size_t length) ^ afl-llvm-pass 1.94b by <lszekeres@google.com> [+] Instrumented 3205 locations (non-hardened mode, ratio 100%). 14 warnings generated. lz5frame.c:898:74: warning: comparison of constant 4194304 with expression of type 'LZ5F_blockMode_t' is always false [-Wtautological-constant-out-of-range-compare] bufferNeeded = dctxPtr->maxBlockSize + ((dctxPtr->frameInfo.blockMode==LZ5F_DICT_SIZE) * 2 * LZ5F_DICT_SIZE); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^ ~~~~~~~~~~~~~~ afl-llvm-pass 1.94b by <lszekeres@google.com> [+] Instrumented 642 locations (non-hardened mode, ratio 100%). 1 warning generated. afl-llvm-pass 1.94b by <lszekeres@google.com> [+] Instrumented 179 locations (non-hardened mode, ratio 100%). creating versioned links make[1]: Leaving directory `/home/madamczyk/afl-1.94b/lz5/lib' ../../afl-clang-fast -DFORTIFY_SOURCE=2 -fstack-protector-all -fsanitize=undefined,address -fsanitize-trap=array-bounds,bool,enum,float-cast-overflow,float-divide-by-zero,function,integer-divide-by-zero,nonnull-attribute,null,object-size,returns-nonnull-attribute,shift-base,shift-exponent,signed-integer-overflow,vla-bound -fno-sanitize=alignment -g -I../lib -DXXH_NAMESPACE=LZ5_ -DLZ5_VERSION=\"v1.4\" -O3 -fomit-frame-pointer -fstrict-aliasing -fforce-addr -ffast-math ../lib/lz5.c -c -o ../lib/lz5.o afl-clang-fast 1.94b by <lszekeres@google.com> afl-llvm-pass 1.94b by <lszekeres@google.com> [+] Instrumented 3840 locations (non-hardened mode, ratio 100%). ../../afl-clang-fast -DFORTIFY_SOURCE=2 -fstack-protector-all -fsanitize=undefined,address -fsanitize-trap=array-bounds,bool,enum,float-cast-overflow,float-divide-by-zero,function,integer-divide-by-zero,nonnull-attribute,null,object-size,returns-nonnull-attribute,shift-base,shift-exponent,signed-integer-overflow,vla-bound -fno-sanitize=alignment -g -I../lib -DXXH_NAMESPACE=LZ5_ -DLZ5_VERSION=\"v1.4\" -O3 -fomit-frame-pointer -fstrict-aliasing -fforce-addr -ffast-math ../lib/lz5hc.c -c -o ../lib/lz5hc.o afl-clang-fast 1.94b by <lszekeres@google.com> afl-llvm-pass 1.94b by <lszekeres@google.com> [+] Instrumented 3205 locations (non-hardened mode, ratio 100%). ../../afl-clang-fast -DFORTIFY_SOURCE=2 -fstack-protector-all -fsanitize=undefined,address -fsanitize-trap=array-bounds,bool,enum,float-cast-overflow,float-divide-by-zero,function,integer-divide-by-zero,nonnull-attribute,null,object-size,returns-nonnull-attribute,shift-base,shift-exponent,signed-integer-overflow,vla-bound -fno-sanitize=alignment -g -I../lib -DXXH_NAMESPACE=LZ5_ -DLZ5_VERSION=\"v1.4\" -O3 -fomit-frame-pointer -fstrict-aliasing -fforce-addr -ffast-math ../lib/lz5frame.c -c -o ../lib/lz5frame.o afl-clang-fast 1.94b by <lszekeres@google.com> ../lib/lz5frame.c:898:74: warning: comparison of constant 4194304 with expression of type 'LZ5F_blockMode_t' is always false [-Wtautological-constant-out-of-range-compare] bufferNeeded = dctxPtr->maxBlockSize + ((dctxPtr->frameInfo.blockMode==LZ5F_DICT_SIZE) * 2 * LZ5F_DICT_SIZE); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^ ~~~~~~~~~~~~~~ afl-llvm-pass 1.94b by <lszekeres@google.com> [+] Instrumented 642 locations (non-hardened mode, ratio 100%). 1 warning generated. ../../afl-clang-fast -DFORTIFY_SOURCE=2 -fstack-protector-all -fsanitize=undefined,address -fsanitize-trap=array-bounds,bool,enum,float-cast-overflow,float-divide-by-zero,function,integer-divide-by-zero,nonnull-attribute,null,object-size,returns-nonnull-attribute,shift-base,shift-exponent,signed-integer-overflow,vla-bound -fno-sanitize=alignment -g -I../lib -DXXH_NAMESPACE=LZ5_ -DLZ5_VERSION=\"v1.4\" -O3 -fomit-frame-pointer -fstrict-aliasing -fforce-addr -ffast-math ../lib/xxhash.c -c -o ../lib/xxhash.o afl-clang-fast 1.94b by <lszekeres@google.com> afl-llvm-pass 1.94b by <lszekeres@google.com> [+] Instrumented 179 locations (non-hardened mode, ratio 100%). ../../afl-clang-fast -DFORTIFY_SOURCE=2 -fstack-protector-all -fsanitize=undefined,address -fsanitize-trap=array-bounds,bool,enum,float-cast-overflow,float-divide-by-zero,function,integer-divide-by-zero,nonnull-attribute,null,object-size,returns-nonnull-attribute,shift-base,shift-exponent,signed-integer-overflow,vla-bound -fno-sanitize=alignment -g -I../lib -DXXH_NAMESPACE=LZ5_ -DLZ5_VERSION=\"v1.4\" -O3 -fomit-frame-pointer -fstrict-aliasing -fforce-addr -ffast-math fullbench.c -c -o fullbench.o afl-clang-fast 1.94b by <lszekeres@google.com> afl-llvm-pass 1.94b by <lszekeres@google.com> [+] Instrumented 353 locations (non-hardened mode, ratio 100%). ../../afl-clang-fast -DFORTIFY_SOURCE=2 -fstack-protector-all -fsanitize=undefined,address -fsanitize-trap=array-bounds,bool,enum,float-cast-overflow,float-divide-by-zero,function,integer-divide-by-zero,nonnull-attribute,null,object-size,returns-nonnull-attribute,shift-base,shift-exponent,signed-integer-overflow,vla-bound -fno-sanitize=alignment -g -I../lib -DXXH_NAMESPACE=LZ5_ -DLZ5_VERSION=\"v1.4\" -O3 -fomit-frame-pointer -fstrict-aliasing -fforce-addr -ffast-math ../lib/lz5.o ../lib/lz5hc.o ../lib/lz5frame.o ../lib/xxhash.o fullbench.o -o fullbench afl-clang-fast 1.94b by <lszekeres@google.com>
inikep (11th January 2016)
Code:katmacadapc% ../../../fullbench -c10 -d9 -i1 id:000000* *** LZ5 speed analyzer v1.4 64-bits, by Yann Collet (Jan 11 2016) *** - 1 iterations - LZ5_compressHC LZ5F_decompress ================================================================= ==27488==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x631000024800 at pc 0x000000516683 bp 0x7fff477a0020 sp 0x7fff477a0018 WRITE of size 8 at 0x631000024800 thread T0 #0 0x516682 (/home/madamczyk/afl-1.94b/lz5/fullbench+0x516682) #1 0x565e8d (/home/madamczyk/afl-1.94b/lz5/fullbench+0x565e8d) #2 0x571d12 (/home/madamczyk/afl-1.94b/lz5/fullbench+0x571d12) #3 0x56feff (/home/madamczyk/afl-1.94b/lz5/fullbench+0x56feff) #4 0x572a96 (/home/madamczyk/afl-1.94b/lz5/fullbench+0x572a96) #5 0x7f090c344eac (/lib/x86_64-linux-gnu/libc.so.6+0x1eeac) #6 0x41b578 (/home/madamczyk/afl-1.94b/lz5/fullbench+0x41b578) 0x631000024800 is located 0 bytes to the right of 65536-byte region [0x631000014800,0x631000024800) allocated by thread T0 here: #0 0x4b8952 (/home/madamczyk/afl-1.94b/lz5/fullbench+0x4b8952) #1 0x56849c (/home/madamczyk/afl-1.94b/lz5/fullbench+0x56849c) SUMMARY: AddressSanitizer: heap-buffer-overflow (/home/madamczyk/afl-1.94b/lz5/fullbench+0x516682) Shadow bytes around the buggy address: 0x0c627fffc8b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0c627fffc8c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0c627fffc8d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0c627fffc8e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0c627fffc8f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 =>0x0c627fffc900:[fa]fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c627fffc910: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c627fffc920: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c627fffc930: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c627fffc940: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c627fffc950: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Heap right redzone: fb Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack partial redzone: f4 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb ==27488==ABORTING
@m^3: I could not reproduce the bug in fullbench using gcc.
The code doesn't compile on my gcc, probably because the standard used is GNU89.
Code:cc -O0 -I../lib -DXXH_NAMESPACE=LZ5_ -DLZ5_VERSION=\"v1.4\" -O3 -fomit-frame-pointer -fstrict-aliasing -fforce-addr -ffast-math ../lib/lz5.c -c -o ../lib/lz5.o In file included from ../lib/lz5.c:41:0: ../lib/lz5common.h:209:1: error: unknown type name ‘uint32_t’ ../lib/lz5common.h:209:39: error: unknown type name ‘uint32_t’ ../lib/lz5common.h:209:56: error: unknown type name ‘uint32_t’ ../lib/lz5common.h:209:73: error: unknown type name ‘uint32_t’ ../lib/lz5common.h:214:37: error: unknown type name ‘uint32_t’ ../lib/lz5common.h:214:56: error: unknown type name ‘uint32_t’ ../lib/lz5common.h:214:78: error: unknown type name ‘uint32_t’ ../lib/lz5common.h:214:92: error: unknown type name ‘uint32_t’ ../lib/lz5common.h:214:109: error: unknown type name ‘uint32_t’ ../lib/lz5common.h:220:40: error: unknown type name ‘uint32_t’ ../lib/lz5common.h:220:59: error: unknown type name ‘uint32_t’ ../lib/lz5common.h:220:81: error: unknown type name ‘uint32_t’ ../lib/lz5common.h:220:95: error: unknown type name ‘uint32_t’ ../lib/lz5common.h:220:126: error: unknown type name ‘uint32_t’ make: *** [../lib/lz5.o] Error 1 mv: cannot stat `fullbench': No such file or directory
The previous crashes are some issue that I introduced in fullbench.c, sorry.
LZ4 and LZ5 requires -std=c99
Then you can clean this up:
And also make sure that -std=c99 is passed to the compiler regrdless custom CFLAGS. In lib it is. In programs it is not.Code:/**************************************************************** * Basic Types *****************************************************************/ #if defined (__cplusplus) || (defined (__STDC_VERSION__) && (__STDC_VERSION__ >= 199901L) /* C99 */) # include <stdint.h> typedef uint8_t BYTE; typedef uint16_t U16; typedef int16_t S16; typedef uint32_t U32; typedef int32_t S32; typedef uint64_t U64; typedef int64_t S64; #else typedef unsigned char BYTE; typedef unsigned short U16; typedef signed short S16; typedef unsigned int U32; typedef signed int S32; typedef unsigned long long U64; typedef signed long long S64; #endif
I'm pretty sure ratio can be improved this way, e.g. E8E9 and its equivalent for ARM (I do not remember its name, but it exists) can improve compression, but it requires some extra actions on decompression. And at these speeds, extra actions in decompressor tend to cost quite some speed. This happens to be especially true for ARMs.
To give some idea about speeds: on ARM board where I rolled some "dev" environment and did some experiments, xz decompresses at about 10 Mb/s, lzma decompresses at about 15Mb/s, zlib is about ~50 Mb/s, zstd around 60, LZ5 around 100, LZ4 about 130, several other "fast LZs" are scoring between LZ4 and LZ5 or so, and their ratio usually close to that of LZ4. So LZ4 is good for speed and LZ5 is good for better ratio without major loss of speed. At very most, LZ4 fast 17 goes about 170Mb/s. But its ratio is really bad. Blosclz can reach whopping 400+ Mb/s, exceeding memcpy, just as they advertise, but ratio is very low and only makes sense for mem-to-mem xfers. As you can see, e.g. any entropy coding immediately kills speed by about 2x.
LZ5 v1.4 beta can be downloaded at:
https://github.com/inikep/lz5/archive/dev.zip
Changes:
- improved: levels from 13 to 15 (maximum compression ratio)
- added: a new parser: LZ5HC_optimal_price_bt
Please help me with fuzzing. Here is a quick comparison:
Code:Compressor name Compression Decompress. Compr. size Ratio crush 1.0 level 0 14 MB/s 195 MB/s 50419812 48.08 crush 1.0 level 1 4.09 MB/s 211 MB/s 48195021 45.96 crush 1.0 level 2 0.55 MB/s 214 MB/s 47105187 44.92 lz5hc v1.3.3 level 13 4.75 MB/s 645 MB/s 46718698 44.55 lz5hc v1.3.3 level 14 3.84 MB/s 671 MB/s 46484969 44.33 lz5hc v1.3.3 level 15 1.93 MB/s 712 MB/s 46227364 44.09 lz5hc v1.3.3 level 16 0.80 MB/s 709 MB/s 46125742 43.99 lz5hc v1.4 level 13 5.38 MB/s 710 MB/s 46383307 44.23 lz5hc v1.4 level 14 4.12 MB/s 669 MB/s 45843096 43.72 lz5hc v1.4 level 15 2.16 MB/s 619 MB/s 45767126 43.65
Cyan (26th January 2016)