Results 1 to 5 of 5

Thread: lz4 speed sudden changes

  1. #1
    Member
    Join Date
    Feb 2016
    Location
    USA
    Posts
    98
    Thanks
    36
    Thanked 8 Times in 8 Posts

    lz4 speed sudden changes

    Not sure if people here have noticed, lz4's compression and decompression speeds change dramatically at block size 64KB. Cannot figure out the reason(s). Any thoughts?

    Here are some simple tests on an Intel(R) Xeon(R) CPU E3-1220 v5 @ 3.00GHz, running Red Hat Enterprise Linux Server release 6.10 (kernel 2.6.32). Granted, it is a quite old machine. Not sure if newer machines will have the same results:

    vigeland -19- <~> lz4 -b1 enwik8
    1#enwik8 : 100000000 -> 57262281 (1.746), 327.9 MB/s ,3000.0 MB/s

    vigeland -20- <~> lz4 -b1 -B7 enwik8
    using blocks of size 4096 KB
    1#enwik8 : 100000000 -> 57285873 (1.746), 333.3 MB/s ,3000.0 MB/s

    vigeland -21- <~> lz4 -b1 -B6 enwik8
    using blocks of size 1024 KB
    1#enwik8 : 100000000 -> 57358020 (1.743), 333.3 MB/s ,3000.0 MB/s

    vigeland -22- <~> lz4 -b1 -B5 enwik8
    using blocks of size 256 KB
    1#enwik8 : 100000000 -> 57645644 (1.735), 330.6 MB/s ,2916.7 MB/s

    vigeland -23- <~> lz4 -b1 -B4 enwik8
    using blocks of size 64 KB
    1#enwik8 : 100000000 -> 57001007 (1.754), 285.7 MB/s ,2307.7 MB/s

    As can be seen from the above results, compression ratio doesn't change much for various block sizes, but both compression and decompression speeds decrease a lot at block size 64KB.

    Any convincing explanations?

  2. #2
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    889
    Thanks
    483
    Thanked 279 Times in 119 Posts
    `lz4` (fast mode) will consider smaller match lengths when presented with a "small" input.
    Small being defined as <= 64 KB.

    This helps compression ratio, but at the expense of some speed.

    Exact differences vary depending on file content,
    with `enwik` being a rather extreme example, as it contains a lot of overlapping tiny matches.
    Most other data sources (binary, structured schema) are different, with typically fewer but larger matches.
    In these other cases, the difference is less pronounced.

  3. Thanks:

    smjohn1 (16th July 2020)

  4. #3
    Member
    Join Date
    Feb 2016
    Location
    USA
    Posts
    98
    Thanks
    36
    Thanked 8 Times in 8 Posts
    OK, but this should only affect compression speed, not decompression speed, as decompression doesn't care whether matches are small or not. Correct?

    Also where exactly are the lines of "consider smaller match lengths when presented with a "small" input" in v1.9.2? I like to see what happens if these lines are removed or changed?

    Thanks.

    Quote Originally Posted by Cyan View Post
    `lz4` (fast mode) will consider smaller match lengths when presented with a "small" input.
    Small being defined as <= 64 KB.

    This helps compression ratio, but at the expense of some speed.

    Exact differences vary depending on file content,
    with `enwik` being a rather extreme example, as it contains a lot of overlapping tiny matches.
    Most other data sources (binary, structured schema) are different, with typically fewer but larger matches.
    In these other cases, the difference is less pronounced.

  5. #4
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    889
    Thanks
    483
    Thanked 279 Times in 119 Posts
    Quote Originally Posted by smjohn1 View Post
    decompression doesn't care whether matches are small or not. Correct?
    Nope, incorrect

    Quote Originally Posted by smjohn1 View Post
    where exactly are the lines of "consider smaller match lengths when presented with a "small" input" in v1.9.2?
    https://github.com/lz4/lz4/blob/dev/lib/lz4.c#L657

  6. #5
    Member
    Join Date
    Feb 2016
    Location
    USA
    Posts
    98
    Thanks
    36
    Thanked 8 Times in 8 Posts
    Quote Originally Posted by Cyan View Post
    Nope, incorrect
    Is it because recovering smaller (shorter) matches is slower than longer matches or literals?



    Thanks, for enwik8, compression ratio improves as well, from 1.754 to 1.775, beside the decompression speed. Will check with more files.

Similar Threads

  1. LZ4: LZ4_DISTANCE_MAX
    By smjohn1 in forum Data Compression
    Replies: 4
    Last Post: 26th May 2020, 19:12
  2. LZ4 chunks
    By Shelwien in forum Data Compression
    Replies: 10
    Last Post: 20th February 2019, 16:27
  3. Compiling lz4
    By smjohn1 in forum Data Compression
    Replies: 11
    Last Post: 3rd January 2018, 22:52
  4. compression speed VS decomp speed: which is more important?
    By Lone_Wolf236 in forum Data Compression
    Replies: 14
    Last Post: 12th July 2010, 19:57
  5. Sudden offtopic
    By Shelwien in forum Forum Archive
    Replies: 1
    Last Post: 20th April 2008, 21:05

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •