Results 1 to 5 of 5

Thread: multi-pass compression

  1. #1
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    889
    Thanks
    483
    Thanked 279 Times in 119 Posts

    multi-pass compression

    An unexpected and very interesting input
    from an LZ4 compression user
    which experienced some very nice gains by using multi-pass compression :

    https://groups.google.com/d/msg/lz4c...k/AVMOPri0O3gJ

    My current guess is that the byte-oriented compressed structure of LZ4 makes it favorable to such scenario.
    Seems like something to investigate...

  2. #2
    Member
    Join Date
    Feb 2010
    Location
    Nordic
    Posts
    200
    Thanks
    41
    Thanked 36 Times in 12 Posts
    Its a very cool reminder!

    Reminds me of FLZP

    FLZP and other byte-oriented non-entropy compressors can be used as preprocessors e.g. http://mattmahoney.net/dc/text.html#4975

    Discussion on reddit
    Last edited by willvarfar; 3rd July 2012 at 14:55.

  3. #3
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,977
    Thanks
    296
    Thanked 1,304 Times in 740 Posts
    In case of LZ4 such a compression improvement is likely due to limits of match length and distance.
    After every pass more matches fit in the window.

    However. its much more interesting with lzma, which indirectly supports sparse matches (via rep-distances).
    For lzma, if we'd split matches and literals into separate streams, it would be possible to capture the file structure -
    like in webserver log files etc.

  4. #4
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts
    The highly repetitive nature of the LZ4 stream where the tokens are byte aligned. Would make it an easy target for a follow on entropy compressor that uses the underlying structure of the 5 fields to compress each separately. I have a feeling that for most files if repeated passes of LZ4 make for a smaller compressed file then. Just a first pass of LZ4 followed by a smart arithmetic coder that uses as a model the 5 fields of each LZ4 sequence would likely compress to a smaller size than the repeated LZ4 passes.

  5. #5
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    889
    Thanks
    483
    Thanked 279 Times in 119 Posts
    Yes, i fully agree with the comments in this thread.

    However, i think we should not confuse the objective :
    the initial contribution was not stating that LZ4 achieved best compression, it certainly does not, even with multiple pass, compared to state of the art.

    The main idea was that using LZ4 with multiple compression passes was a cheap way to grab some impressive compression gains on log file compression.

    In any case, it seems that it can be used as a pre-processor, re-using the word of willvarfar.
    Indeed, on top of shortening the length between similar patterns, it also, quite simply, reduce the amount of data to process by the next round.
    It's still possible to use one of the state-of-the-art compressors for the last round (which i would even recommend).
    The SotA compressor will complete its job much faster than when dealing directly with the initial raw data, just because there is so much less to process.

    Even if the final compression ratio is "not as strong" as using the SotA algorithm directly on raw data, for such a use case (log file compression), just getting a "pretty similar" compression ratio is good enough. The speed gain is likely to mean useful time saving in real work environments.

    And that's the core point of this proposition, imho.

    Edit : thanks willvarfar for your initiative on reddit. It seems to attract quite some attention, and useful comments with it.

Similar Threads

  1. PAQ multi thread
    By frede_sch in forum Data Compression
    Replies: 12
    Last Post: 1st November 2011, 01:29
  2. Multi-way QuickSort
    By Piotr Tarsa in forum The Off-Topic Lounge
    Replies: 7
    Last Post: 9th May 2011, 18:59
  3. Multi-threaded compression
    By Cyan in forum Data Compression
    Replies: 34
    Last Post: 16th January 2011, 17:32
  4. Multi-Volume Archives
    By osmanturan in forum Data Compression
    Replies: 12
    Last Post: 13th June 2009, 01:46
  5. Multi-threading motivation
    By Trixter in forum Data Compression
    Replies: 1
    Last Post: 10th September 2008, 05:18

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •