Results 1 to 25 of 25

Thread: Compression library advice

  1. #1
    Member
    Join Date
    Oct 2015
    Location
    sbp
    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Question Compression library advice

    Hi everybody! Can you advise me some good compression library. My goal is to compress small chunks of binary data - 8KB pages. Library must have good portability (arm/x86), cause i'm going to use it on various mobile platforms. Good compression ratio is important. Compression speed is not important and can be slow. But decompression speed is very important and must be as fast as possible. I'm looking for free non GPL library, but commercial products are acceptable too. Also library must be well tested and production ready. I've tried LZ4 and snappy, but it can't compress my data enough. LZO (lzo1x_999) is perfect but GPL. Is there some non-GPL analogs? And have somebody tried LZO professional?

  2. #2
    Member
    Join Date
    Feb 2015
    Location
    United Kingdom
    Posts
    162
    Thanks
    24
    Thanked 69 Times in 40 Posts
    if high compression and fast decoding is what you're looking for then I'd recommend lzham.

    https://github.com/richgel999/lzham_codec

  3. #3
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    766
    Thanks
    217
    Thanked 285 Times in 167 Posts
    Brotli would likely work well. Zlib with zopfli for compression is not a bad choice either.

  4. #4
    Member jibz's Avatar
    Join Date
    Jan 2015
    Location
    Denmark
    Posts
    122
    Thanks
    105
    Thanked 71 Times in 51 Posts
    You could use the Squash benchmark to help you pick candidates; choose an input file that resembles your data, and a suitable machine, and check the compression ratio vs. decompression speed chart.

    Do you need both the compression and decompression to be portable?
    Last edited by jibz; 20th October 2015 at 19:54.

  5. #5
    Member RichSelian's Avatar
    Join Date
    Aug 2011
    Location
    Shenzhen, China
    Posts
    168
    Thanks
    20
    Thanked 59 Times in 28 Posts
    Quote Originally Posted by Lucas View Post
    if high compression and fast decoding is what you're looking for then I'd recommend lzham.

    https://github.com/richgel999/lzham_codec
    brotli does not work well on non-english text data.

  6. #6
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    869
    Thanks
    470
    Thanked 261 Times in 108 Posts
    > LZO (lzo1x_999) is perfect but GPL

    Did you tried lz4 hc (High compression mode) at strongest level (16) ?
    It's supposed to be within striking distances of lzo1x_999

  7. Thanks:

    Jyrki Alakuijala (21st October 2015)

  8. #7
    Member
    Join Date
    Dec 2012
    Location
    japan
    Posts
    159
    Thanks
    30
    Thanked 62 Times in 38 Posts
    Tree and GLZA has good compression and decompression speed.

  9. #8
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    766
    Thanks
    217
    Thanked 285 Times in 167 Posts
    Quote Originally Posted by RichSelian View Post
    brotli does not work well on non-english text data.
    On squash benchmark the compression ratios on 'sum' -- a 37 kB SPARC executable -- are:

    LZMA: 4.07
    LZHAM: 3.48
    brotli: 3.75
    zlib: 2.98
    lzo: 2.75
    lz4: 2.34
    wflz: 1.96

    What kind of results do you get with brotli on small binaries (~8 kB) ?

  10. #9
    Member
    Join Date
    Oct 2015
    Location
    Hangzhou, China
    Posts
    1
    Thanks
    2
    Thanked 0 Times in 0 Posts
    Have you tried bc-zip?
    It can specify the decompression time and get the optimal compression ratio.

  11. #10
    Member RichSelian's Avatar
    Join Date
    Aug 2011
    Location
    Shenzhen, China
    Posts
    168
    Thanks
    20
    Thanked 59 Times in 28 Posts
    in my opinion, using PAQ or snappy for only 8KB block is not that different on compression ratio. you should choose a fastest one which a smaller file header.

  12. #11
    Member
    Join Date
    Mar 2013
    Location
    Worldwide
    Posts
    565
    Thanks
    67
    Thanked 198 Times in 147 Posts
    You can test allmost all popular or notable compressors with TurboBench for windows and linux (see also TurboBench).

    1 - you have only one large file: set the block size with the option "-b8K"

    2 - you have the blocks as separate files: benchmark all individual files with a single command
    Ex. "turbobench -ezlib,9 *.dat" or
    "turbobench -r datdir"

    3 - you have fixed (8K) length blocks as separate files: concatenate all binary files
    into a single block, then use the option "-b8K" as above.

    4 - you have variable length blocks as separate files: use multiblock mode to benchmark small files:

    a - Concatenate small files into a single multiblock file (max. 100 MB)
    "Turbobench -Mmyfile *.dat"
    b - Benchmark "myfile" file using the "-m" option
    "TurboBench -m -ezlib,9 myfile"

    each block will be compressed separately, but you get more accurate timings


    Do not expect too much in terms of compression ratio, wenn compressing small blocks with general purpose compressors.
    Maybe you can follow my post "Efficient random access".
    Last edited by dnd; 23rd October 2015 at 14:15.

  13. #12
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    766
    Thanks
    217
    Thanked 285 Times in 167 Posts
    brotli 5 gives 20 % more dense output than lz4, and 25 % more dense than snappy. It is highly likely that there is more relative space savings (possibly 10 % more) with brotli 9 or brotli 11, but these are not reported with this benchmarking tool.

    turbobenchs -b9K /usr/bin/vim.basic (-b4K and -b8K fails on this data possibly on the shrinker roundtrip test):

    2191736 1185277 54.1 38.18 221.20 brotli 5 x
    2191736 1193475 54.5 22.84 213.96 zlib 9 x
    2191736 1195286 54.5 38.77 213.69 zlib 6 x
    2191736 1242519 56.7 68.89 203.49 zlib 1 x
    2191736 1258808 57.4 32.67 576.33 lzturbo 32 x
    2191736 1284295 58.6 86.45 232.29 brotli 1 x
    2191736 1311825 59.9 116.39 583.98 lzturbo 31 x
    2191736 1323336 60.4 93.10 577.46 lzturbo 30.1 x
    2191736 1323655 60.4 116.16 577.41 lzturbo 30 x
    2191736 1376945 62.8 225.48 474.46 zstd x
    2191736 1421182 64.8 124.84 1584.08 lzturbo 22 x
    2191736 1488588 67.9 280.48 283.31 gipfeli x
    2191736 1492577 68.1 273.87 1922.85 lzturbo 21 x
    2191736 1495865 68.3 101.91 2723.21 lz4 9 x
    2191736 1514332 69.1 243.14 1843.86 lzturbo 20.1 x
    2191736 1531653 69.9 495.24 1954.98 lzturbo 20 x
    2191736 1535550 70.1 373.48 1400.47 shrinker x
    2191736 1536444 70.1 157.24 3941.83 lzturbo 12 x
    2191736 1563135 71.3 210.68 3772.52 lzturbo 11 x
    2191736 1592684 72.7 442.69 1551.72 snappy x
    2191736 1592684 72.7 517.25 1362.09 snappy-c x
    2191736 1595465 72.8 477.57 3672.22 lzturbo 10.1 x
    2191736 1609708 73.4 575.26 3722.34 lzturbo 10 x
    2191736 1633460 74.5 604.77 2943.42 lz4 1 x
    2191736 1877482 85.7 138.00 149.28 density 2 x

  14. Thanks:

    dnd (23rd October 2015)

  15. #13
    Member
    Join Date
    Mar 2013
    Location
    Worldwide
    Posts
    565
    Thanks
    67
    Thanked 198 Times in 147 Posts
    Quote Originally Posted by Jyrki Alakuijala View Post
    brotli 5 gives 20 % more dense output than lz4, and 25 % more dense than snappy. It is highly likely that there is more relative space savings (possibly 10 % more) with brotli 9 or brotli 11, but these are not reported with this benchmarking tool
    Hi Jyrki,
    of course you can test every brotli compression level with the option "-e" and allmost every popular compressor .
    The default list is corresponding to the group option "-efast"


    You can use for ex.:
    "turbobench -efast/brotli,9/brotli,11" (test all fast compressors + brotli,9 + brotli,11)
    "turbobench -b8K -ebrotli,11/lzma,9/lzham,4/lzturbo,39/lzturbo,49"
    "turbobench -elzturbo,19 -i0" (timing test for decompression)

    The windows size -24 is used for brotli 11 and -22 for all other levels

    type "turbobench -g1" to get a list and "turbobench -h" for help.
    Last edited by dnd; 23rd October 2015 at 15:51.

  16. #14
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    766
    Thanks
    217
    Thanked 285 Times in 167 Posts
    With turbobench -efast/brotli,9/brotli,11 -- brotli 9 and 11 take the top two places for 9 kB block density. brotli 11 is 11 % denser than lzturbo 32, 18 % denser than zstd, 25 % denser than lz4 q9, and 30 % denser than snappy.

    2191736 1121872 51.2 0.66 160.62 brotli 11 x
    2191736 1181621 53.9 4.46 220.37 brotli 9 x
    2191736 1185277 54.1 38.24 221.79 brotli 5 x

    With the turbobench -b9K -ebrotli,11/lzma,9/lzham,4/lzturbo,39/lzturbo,49 lzma 9 is the compression density winner:

    2191736 1112650 50.8 1.34 43.86 lzma 9 x
    2191736 1121872 51.2 0.67 160.40 brotli 11 x
    2191736 1150598 52.5 2.59 23.44 lzturbo 49 x
    2191736 1183504 54.0 1.35 60.02 lzham 4 x
    2191736 1218955 55.6 3.81 537.70 lzturbo 39 x
    Last edited by Jyrki Alakuijala; 23rd October 2015 at 21:41.

  17. #15
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    766
    Thanks
    217
    Thanked 285 Times in 167 Posts
    lzturbo 39 looks good on decoding performance/compression density compromise. I am guessing it is based on SSE/AVX vectorized decoding. If so, is it possible to decode quickly on arm/neon?

  18. #16
    Member
    Join Date
    Mar 2013
    Location
    Worldwide
    Posts
    565
    Thanks
    67
    Thanked 198 Times in 147 Posts
    lzturbo can be configured to run with or without SSE/AVX.
    The standard SSE2 set, also available on arm cpus is warranted when SIMD is used, allthough the version
    included in TurboBench is using only a minimum SIMD. The decoding speed of "lzturbo 3x" was reached until now only
    by using byte coding. Look for ex. at this benchmark where "lzturbo 3x" is decoding faster than snappy.

  19. #17
    Member
    Join Date
    Oct 2015
    Location
    sbp
    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts
    thanks everybody! i don't get notifications from this thread, that's why i'm so late with my answer, sorry. i've tried lz4hc (16 level), it's compression ratio is not good enough. quicklz is better, but not as good as lzo. zstd comression ratio is great, but decompression speed is slow. where can i get lzturbo source code?

  20. #18
    Member
    Join Date
    Oct 2015
    Location
    sbp
    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts
    i've tried lzham too. it's so slow in decompression.

  21. #19
    Member
    Join Date
    Dec 2012
    Location
    japan
    Posts
    159
    Thanks
    30
    Thanked 62 Times in 38 Posts
    lzturbo is closed source.
    Btw did you use xeloz?
    http://encode.su/threads/1979-xeloz?p=40309&viewfull=1#post40309

  22. #20
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    766
    Thanks
    217
    Thanked 285 Times in 167 Posts
    Quote Originally Posted by bloom256 View Post
    zstd comression ratio is great, but decompression speed is slow.
    This is very interesting. In my experience zstd works well only for streaming large data, it is not intended for tiny data like 8 kB blocks. I'd like to understand how our experiences can be different on this. If I look at https://quixdb.github.io/squash-benchmark/ and look at the three smallest files, zstd is not on the pareto-optimal front in decompression speed/density, and particularly worse on compression density than zlib.

    Would you consider sharing your benchmark so that we get more light into what you are trying to achieve, and how you got your results?

    What kind of results did you get with zopfli and brotli?

  23. Thanks:

    Cyan (10th November 2015)

  24. #21
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    869
    Thanks
    470
    Thanked 261 Times in 108 Posts
    quicklz is better, but not as good as lzo. zstd comression ratio is great, but decompression speed is slow.
    This statement is indeed interesting.

    Using inikep's lzbench to make some tests with 8 KB blocks,
    on tested machine (Intel Corei7), decompression speeds of quicklz and zstd are basically on par.
    (Zstd decompression speed is faster on larger sizes, but the test was done specifically with 8 KB blocks)

    Were your results significantly different in your tests ?

  25. #22
    Member jibz's Avatar
    Join Date
    Jan 2015
    Location
    Denmark
    Posts
    122
    Thanks
    105
    Thanked 71 Times in 51 Posts
    Just wanted to note that it looks like lzbench includes any initialization cost (like allocation of temporary memory) for every 8k block.

    This may or may not be what you want, depending on if you only process one block in your app, or if it keeps running and processes multiple blocks.

  26. #23
    Programmer
    Join Date
    May 2008
    Location
    PL
    Posts
    309
    Thanks
    68
    Thanked 173 Times in 64 Posts
    Quote Originally Posted by jibz View Post
    Just wanted to note that it looks like lzbench includes any initialization cost (like allocation of temporary memory) for every 8k block.
    It concerns only brieflz, lzo, lzrw, quicklz, wflz and will be fixed in the next release.

  27. #24
    Member
    Join Date
    Oct 2013
    Location
    Filling a much-needed gap in the literature
    Posts
    350
    Thanks
    177
    Thanked 49 Times in 35 Posts
    Quote Originally Posted by bloom256 View Post
    Hi everybody! Can you advise me some good compression library. My goal is to compress small chunks of binary data - 8KB pages.
    Can you tell us what kind of "pages" you're talking about, and in what sense(s) the data is "binary"? Are these pages of executable "binaries" (code plus literal data and maybe link tables), or actual data (arrays of ints/floats, images, sequence data, or whatever). Do they include text? Are these 8KB file blocks, or 8KB pages of a frozen virtual memory image, or what?

  28. #25
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 795 Times in 488 Posts
    Depends on what kind of interface you want. zlib is in C and works on arrays that you have to allocate. libzpaq is in C++ and works on I/O streams where you write your own get() and put() or read() and write() functions. zlib is fast, but libzpaq has options for better but slower compression.

Similar Threads

  1. Assessing compression library reliability
    By dbbd in forum Data Compression
    Replies: 5
    Last Post: 28th May 2015, 16:03
  2. Need some advice on compression artifact
    By boxerab in forum Data Compression
    Replies: 0
    Last Post: 27th May 2014, 20:59
  3. Replies: 33
    Last Post: 27th August 2011, 05:13
  4. Advice in data compression
    By Chuckie in forum Data Compression
    Replies: 29
    Last Post: 26th March 2010, 15:09
  5. MM compression library
    By Bulat Ziganshin in forum Forum Archive
    Replies: 29
    Last Post: 12th September 2007, 15:40

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •