Results 1 to 7 of 7

Thread: gzip vs zlib; benchmarking considerations

  1. #1
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    353
    Thanks
    131
    Thanked 54 Times in 38 Posts

    gzip vs zlib; benchmarking considerations

    Hi all – The Large Text Compression Benchmark led me down a winding road on tracking down the canonical gzip and zlib applications. The LTCB uses an ancient version of gzip for Windows, which is almost certainly slower than current zlib for generating gzip files, and probably slower than GNU gzip. The gzip for Windows in the LTCB was released in 2006 or possibly earlier. Using that old build is therefore misleading with respect to how gzip or zlib actually performs compared to other codecs.

    If you want to include gzip or zlib in your benchmarks, these are the relevant websites:

    GNU gzip:

    http://savannah.gnu.org/projects/gzip/
    https://www.gnu.org/software/gzip/
    http://www.gzip.org/

    zlib:


    https://www.zlib.net/
    https://github.com/madler/zlib

    As you know, zlib is generally used to generate gzip files, typically on web servers. zlib can also generate a zlib file format, but no one seems interested in it (is it another DEFLATE wrapper?), since browsers consume gzip.

    zlib is a library used for streaming applications, like the above mentioned web servers (nginx, Apache, IIS, H2O).

    GNU gzip is an end-user application included in most Linux and BSD distributions for compressing files as needed. (Is it called by other applications?) They have three different websites, as listed above.

    It might be worth including both in benchmarks (the latest versions).

  2. Thanks:

    introspec (14th August 2019)

  3. #2
    Member jibz's Avatar
    Join Date
    Jan 2015
    Location
    Denmark
    Posts
    124
    Thanks
    106
    Thanked 71 Times in 51 Posts
    Quote Originally Posted by SolidComp View Post
    As you know, zlib is generally used to generate gzip files, typically on web servers. zlib can also generate a zlib file format, but no one seems interested in it (is it another DEFLATE wrapper?), since browsers consume gzip.
    I believe PNG files store their deflate compressed data in zlib format.

  4. #3
    Member
    Join Date
    Dec 2011
    Location
    Cambridge, UK
    Posts
    506
    Thanks
    186
    Thanked 177 Times in 120 Posts
    You may also want to consider libdeflate or zlibng:

    https://github.com/ebiggers/libdeflate
    https://github.com/zlib-ng/zlib-ng

    There are CloudFlare and Intel optimised versions of zlib too.

    Edit: for block based formats libdeflate is super. I added it to htslib for the "BAM" sequence alignment format. It beat both cloudflare and intel versions and trounced the original zlib, but bizarrely all linux distributions seem to still ship with the slow code base. Libdeflate however isn't a drop-in replacement (unlike the others). https://github.com/samtools/htslib/pull/581

  5. #4
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    846
    Thanks
    242
    Thanked 309 Times in 184 Posts
    Streaming and non-streaming implementations/formats should not be mixed in benchmarking, as often properly streaming are about 2x slower but lead to better overall system performance (like web pages loading faster, intermediate results shown earlier, subsequent queries issued earlier hiding overall latency).

    For gzip's main use (web) a non-streaming implementation would likely be highly harmful for system performance.

  6. #5
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    353
    Thanks
    131
    Thanked 54 Times in 38 Posts
    Quote Originally Posted by JamesB View Post
    You may also want to consider libdeflate or zlibng:

    https://github.com/ebiggers/libdeflate
    https://github.com/zlib-ng/zlib-ng

    There are CloudFlare and Intel optimised versions of zlib too.

    Edit: for block based formats libdeflate is super. I added it to htslib for the "BAM" sequence alignment format. It beat both cloudflare and intel versions and trounced the original zlib, but bizarrely all linux distributions seem to still ship with the slow code base. Libdeflate however isn't a drop-in replacement (unlike the others). https://github.com/samtools/htslib/pull/581
    Yes, libdeflate is very good. The issue with those projects is that none are drop-in replacements for zlib, so they have very limited use. libdeflate doesn't do streaming. zlib-ng was declared by its contributors as not production ready last time I checked. Cloudflare doesn't document their forks and provides no instructions, so they usually don't work. The Intel fork didn't work either when the Centminmod project tried to use it (Cloudflare wouldn't build either).

  7. #6
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    353
    Thanks
    131
    Thanked 54 Times in 38 Posts
    Quote Originally Posted by SolidComp View Post
    Yes, libdeflate is very good. The issue with those projects is that none are drop-in replacements for zlib, so they have very limited use. libdeflate doesn't do streaming. zlib-ng was declared by its contributors as not production ready last time I checked. Cloudflare doesn't document their forks and provides no instructions, so they usually don't work. The Intel fork didn't work either when the Centminmod project tried to use it (Cloudflare wouldn't build either).
    Well, it looks like he ultimately got Cloudflare's fork working: https://community.centminmod.com/thr...nx-zlib.14084/

  8. #7
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    353
    Thanks
    131
    Thanked 54 Times in 38 Posts
    Quote Originally Posted by Jyrki Alakuijala View Post
    Streaming and non-streaming implementations/formats should not be mixed in benchmarking, as often properly streaming are about 2x slower but lead to better overall system performance (like web pages loading faster, intermediate results shown earlier, subsequent queries issued earlier hiding overall latency).

    For gzip's main use (web) a non-streaming implementation would likely be highly harmful for system performance.
    That's a good point. I think a good use for libdeflate and zopfli is precompressing static files, like for Static Site Generators like Jekyll, and SVG images, etc.

Similar Threads

  1. Hash / Checksum algorithm considerations
    By Cyan in forum Data Compression
    Replies: 61
    Last Post: 16th June 2017, 00:28
  2. Multithread benchmarking
    By Sportman in forum Data Compression
    Replies: 2
    Last Post: 21st September 2016, 13:55
  3. Zlib-ng: a performance-oriented fork of zlib
    By dnd in forum Data Compression
    Replies: 0
    Last Post: 5th June 2015, 14:29
  4. my file compression considerations
    By JB_ in forum Data Compression
    Replies: 2
    Last Post: 5th May 2008, 19:47
  5. gzip-1.2.4-hack - a hacked version of gzip
    By encode in forum Forum Archive
    Replies: 63
    Last Post: 10th September 2007, 04:16

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •