Results 1 to 5 of 5

Thread: LZ4: LZ4_DISTANCE_MAX

  1. #1
    Member
    Join Date
    Feb 2016
    Location
    USA
    Posts
    95
    Thanks
    35
    Thanked 8 Times in 8 Posts

    LZ4: LZ4_DISTANCE_MAX

    README.md: `LZ4_DISTANCE_MAX` : control the maximum offset that the compressor will allow. Set to 65535 by default, which is the maximum value supported by lz4 format.
    Reducing maximum distance will reduce opportunities for LZ4 to find matches,
    hence will produce worse the compression ratio.

    The above is true for high compression modes, i.e., levels above 3, but the opposite is true for compression levels 1 and 2.

    Here is a test result using default value ( 65535 ):

    <TestData> lz4-v1.9.1 -b1 enwik8 1#enwik8 : 100000000 -> 57262281 (1.746), 325.6 MB/s ,2461.0 MB/s

    and result using a smaller value ( 32767 ):
    <TestData> lz4-1.9.1-32 -b1 enwik8
    1#enwik8 : 100000000 -> 53005796 (1.887), 239.3 MB/s ,2301.1 MB/s

    Anything unusual in LZ4_compress_generic() implementation? Could anyone shed some light? Thanks in advance.

  2. #2
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    885
    Thanks
    480
    Thanked 278 Times in 118 Posts
    I don't get the same result.

    When testing with LZ4_DISTANCE_MAX == 32767,
    I get 57548126 for enwik8,
    aka slightly worse than a distance of 64KB.

    In order to get the same compressed size as you, I first need to increase the memory usage by quite a bit (from 16 KB to 256 KB),
    which is the actual reason for the compression ratio improvement (and compression speed decrease).

    The impact of MAX_DISTANCE is less dramatic than for high compression mode because, by the very nature of the fast mode, it doesn't have much time to search,
    so most searches will end up testing candidates at rather short distances anyway.
    But still, reducing max distance should nonetheless, on average, correspond to some loss of ratio, even if a small one.

  3. Thanks:

    smjohn1 (26th May 2020)

  4. #3
    Member
    Join Date
    Feb 2016
    Location
    USA
    Posts
    95
    Thanks
    35
    Thanked 8 Times in 8 Posts
    You are right. Checked the code again, and memory use level was indeed 18 instead of 14. So that was the reason, which makes sense.

    On the other other hand, smaller LZ4_DISTANCE_MAX results in speed decrease ( though slightly ) in compression. Is that because literal processing ( memory copy ) is slower than match processing?

  5. #4
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    885
    Thanks
    480
    Thanked 278 Times in 118 Posts
    In fast mode, finding more matches corresponds to effectively skipping more data and searching less, so it tends to be faster indeed.

  6. #5
    Member
    Join Date
    Feb 2016
    Location
    USA
    Posts
    95
    Thanks
    35
    Thanked 8 Times in 8 Posts
    OK, that makes sense too. So reducing LZ4_DISTANCE_MAX doesn't necessary increases compression speed. That might be a sweet spot in terms of compression speed.

Similar Threads

  1. LZ4 chunks
    By Shelwien in forum Data Compression
    Replies: 10
    Last Post: 20th February 2019, 16:27
  2. Compiling lz4
    By smjohn1 in forum Data Compression
    Replies: 11
    Last Post: 3rd January 2018, 22:52
  3. May be this will accelerate LZ4 decompression?
    By lz77 in forum Data Compression
    Replies: 4
    Last Post: 14th November 2017, 10:26
  4. LZ4, BWT, RLE?
    By alberto98fx in forum Data Compression
    Replies: 6
    Last Post: 3rd July 2016, 20:09
  5. LZ4 Streaming API
    By Cyan in forum Data Compression
    Replies: 0
    Last Post: 20th May 2014, 21:45

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •