Results 1 to 3 of 3

Thread: Better compression/performance with constrained input data

  1. #1
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    353
    Thanks
    131
    Thanked 54 Times in 38 Posts

    Better compression/performance with constrained input data

    Hi all – How much would it help LZ-Huffman style compression to know in advance the kind of data you're compressing? I don't think I've seen discussion of this. What I mean is if you know that your input is always JSON, for example. Or actually minified JSON, minified according to known rules.

    Or minified HTML, or really anything where you know in advance that the input data has certain characteristics. For example, you might know the longest possible match length.

    I've been thinking mostly about how best to compress very small JSON payloads, like for credit card payments and other financial messages. In raw form, they might be 2 KB long, and 1 KB when minified. It's an interesting case because there are no long matches, mostly just 3 - 9 bytes if no keys or values repeat in a given payload.

    If you didn't look for matches longer than say 16 bytes, could you fly through it faster? I think deflate is 128 or 256 max length?

    What other characteristics or constraints on input data could help optimize a compressor? I'm incredibly impressed with SLZ lately, and I wonder if it could be optimized further by knowing in advance what kind of data it was compressing (like small JSON payloads), and maybe the ratios could be improved with little overhead penalty. Brotli does very well on these JSON payloads – the best so far – and I wonder if something like brotli 11 could be optimized to have less overhead if it's only small JSON files.

    Charles Bloom has an interesting article here on LZ optimal parsing.

  2. #2
    Member
    Join Date
    Dec 2011
    Location
    Cambridge, UK
    Posts
    506
    Thanks
    186
    Thanked 177 Times in 120 Posts
    You can get far better performance for sure. You *may* also be able to get better compression.

    For example I tweaked libdeflate at one stage to up the minimum length of a match as on my particular data this gave better compressionr ratios at fast levels. However libdeflate -12 still won out as the optimal parsing did a better job.

    This is similar to Zlib which has Z_DEFAULT_STRATEGY and Z_FILTERED, with the latter being suggested for more binary or processed data. It favours fewer longer matches, and it does give better ratios.

  3. Thanks:

    SolidComp (9th June 2020)

  4. #3
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    353
    Thanks
    131
    Thanked 54 Times in 38 Posts
    Quote Originally Posted by JamesB View Post
    You can get far better performance for sure. You *may* also be able to get better compression.

    For example I tweaked libdeflate at one stage to up the minimum length of a match as on my particular data this gave better compressionr ratios at fast levels. However libdeflate -12 still won out as the optimal parsing did a better job.

    This is similar to Zlib which has Z_DEFAULT_STRATEGY and Z_FILTERED, with the latter being suggested for more binary or processed data. It favours fewer longer matches, and it does give better ratios.
    Interesting, so the way deflate works is not exhaustive with respect to the matches / repeated strings? Is zopfli exhaustive?

Similar Threads

  1. loseless data compression method for all digital data type
    By rarkyan in forum Random Compression
    Replies: 244
    Last Post: 23rd March 2020, 16:33
  2. Replies: 10
    Last Post: 24th February 2020, 21:40
  3. Replies: 95
    Last Post: 27th May 2019, 10:07
  4. BWT with compressed input data
    By Shelwien in forum Data Compression
    Replies: 3
    Last Post: 29th May 2009, 15:16
  5. Better compression performance across time?
    By Trixter in forum Data Compression
    Replies: 16
    Last Post: 16th June 2008, 23:35

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •