Results 1 to 7 of 7

Thread: gzip vs zlib; benchmarking considerations

  1. #1
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    238
    Thanks
    95
    Thanked 47 Times in 31 Posts

    gzip vs zlib; benchmarking considerations

    Hi all – The Large Text Compression Benchmark led me down a winding road on tracking down the canonical gzip and zlib applications. The LTCB uses an ancient version of gzip for Windows, which is almost certainly slower than current zlib for generating gzip files, and probably slower than GNU gzip. The gzip for Windows in the LTCB was released in 2006 or possibly earlier. Using that old build is therefore misleading with respect to how gzip or zlib actually performs compared to other codecs.

    If you want to include gzip or zlib in your benchmarks, these are the relevant websites:

    GNU gzip:

    http://savannah.gnu.org/projects/gzip/
    https://www.gnu.org/software/gzip/
    http://www.gzip.org/

    zlib:


    https://www.zlib.net/
    https://github.com/madler/zlib

    As you know, zlib is generally used to generate gzip files, typically on web servers. zlib can also generate a zlib file format, but no one seems interested in it (is it another DEFLATE wrapper?), since browsers consume gzip.

    zlib is a library used for streaming applications, like the above mentioned web servers (nginx, Apache, IIS, H2O).

    GNU gzip is an end-user application included in most Linux and BSD distributions for compressing files as needed. (Is it called by other applications?) They have three different websites, as listed above.

    It might be worth including both in benchmarks (the latest versions).

  2. Thanks:

    introspec (14th August 2019)

  3. #2
    Member jibz's Avatar
    Join Date
    Jan 2015
    Location
    Denmark
    Posts
    116
    Thanks
    96
    Thanked 69 Times in 49 Posts
    Quote Originally Posted by SolidComp View Post
    As you know, zlib is generally used to generate gzip files, typically on web servers. zlib can also generate a zlib file format, but no one seems interested in it (is it another DEFLATE wrapper?), since browsers consume gzip.
    I believe PNG files store their deflate compressed data in zlib format.

  4. #3
    Member
    Join Date
    Dec 2011
    Location
    Cambridge, UK
    Posts
    462
    Thanks
    147
    Thanked 158 Times in 106 Posts
    You may also want to consider libdeflate or zlibng:

    https://github.com/ebiggers/libdeflate
    https://github.com/zlib-ng/zlib-ng

    There are CloudFlare and Intel optimised versions of zlib too.

    Edit: for block based formats libdeflate is super. I added it to htslib for the "BAM" sequence alignment format. It beat both cloudflare and intel versions and trounced the original zlib, but bizarrely all linux distributions seem to still ship with the slow code base. Libdeflate however isn't a drop-in replacement (unlike the others). https://github.com/samtools/htslib/pull/581

  5. #4
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    702
    Thanks
    210
    Thanked 267 Times in 157 Posts
    Streaming and non-streaming implementations/formats should not be mixed in benchmarking, as often properly streaming are about 2x slower but lead to better overall system performance (like web pages loading faster, intermediate results shown earlier, subsequent queries issued earlier hiding overall latency).

    For gzip's main use (web) a non-streaming implementation would likely be highly harmful for system performance.

  6. #5
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    238
    Thanks
    95
    Thanked 47 Times in 31 Posts
    Quote Originally Posted by JamesB View Post
    You may also want to consider libdeflate or zlibng:

    https://github.com/ebiggers/libdeflate
    https://github.com/zlib-ng/zlib-ng

    There are CloudFlare and Intel optimised versions of zlib too.

    Edit: for block based formats libdeflate is super. I added it to htslib for the "BAM" sequence alignment format. It beat both cloudflare and intel versions and trounced the original zlib, but bizarrely all linux distributions seem to still ship with the slow code base. Libdeflate however isn't a drop-in replacement (unlike the others). https://github.com/samtools/htslib/pull/581
    Yes, libdeflate is very good. The issue with those projects is that none are drop-in replacements for zlib, so they have very limited use. libdeflate doesn't do streaming. zlib-ng was declared by its contributors as not production ready last time I checked. Cloudflare doesn't document their forks and provides no instructions, so they usually don't work. The Intel fork didn't work either when the Centminmod project tried to use it (Cloudflare wouldn't build either).

  7. #6
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    238
    Thanks
    95
    Thanked 47 Times in 31 Posts
    Quote Originally Posted by SolidComp View Post
    Yes, libdeflate is very good. The issue with those projects is that none are drop-in replacements for zlib, so they have very limited use. libdeflate doesn't do streaming. zlib-ng was declared by its contributors as not production ready last time I checked. Cloudflare doesn't document their forks and provides no instructions, so they usually don't work. The Intel fork didn't work either when the Centminmod project tried to use it (Cloudflare wouldn't build either).
    Well, it looks like he ultimately got Cloudflare's fork working: https://community.centminmod.com/thr...nx-zlib.14084/

  8. #7
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    238
    Thanks
    95
    Thanked 47 Times in 31 Posts
    Quote Originally Posted by Jyrki Alakuijala View Post
    Streaming and non-streaming implementations/formats should not be mixed in benchmarking, as often properly streaming are about 2x slower but lead to better overall system performance (like web pages loading faster, intermediate results shown earlier, subsequent queries issued earlier hiding overall latency).

    For gzip's main use (web) a non-streaming implementation would likely be highly harmful for system performance.
    That's a good point. I think a good use for libdeflate and zopfli is precompressing static files, like for Static Site Generators like Jekyll, and SVG images, etc.

Similar Threads

  1. Hash / Checksum algorithm considerations
    By Cyan in forum Data Compression
    Replies: 61
    Last Post: 16th June 2017, 01:28
  2. Multithread benchmarking
    By Sportman in forum Data Compression
    Replies: 2
    Last Post: 21st September 2016, 14:55
  3. Zlib-ng: a performance-oriented fork of zlib
    By dnd in forum Data Compression
    Replies: 0
    Last Post: 5th June 2015, 15:29
  4. my file compression considerations
    By JB_ in forum Data Compression
    Replies: 2
    Last Post: 5th May 2008, 20:47
  5. gzip-1.2.4-hack - a hacked version of gzip
    By encode in forum Forum Archive
    Replies: 63
    Last Post: 10th September 2007, 05:16

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •