Results 1 to 9 of 9

Thread: Modifying LZ4 offset

  1. #1
    Member
    Join Date
    Feb 2016
    Location
    USA
    Posts
    98
    Thanks
    36
    Thanked 8 Times in 8 Posts

    Modifying LZ4 offset

    I am trying a new encoding method for LZ4's match offset. which varies from 1 byte and 3 bytes, still byte aligned. Of course the hope is most offsets only need 1 byte, thus better compression ratio, and maybe compression speed as well.

    In implementation by simple modifications of the LZ4-V.1.9.2 package, in LZ4_compress_generic(), two places of

    LZ4_writeLE16(op, (U16)offset); op+=2;

    and
    LZ4_writeLE16(op, (U16)(ip - match)); op+=2;

    are replaced with new encoding method; ( just ignore the high compression part )

    and two places in LZ4_decompress_generic()

    offset = LZ4_readLE16(ip); ip+=2;

    are replaced with new offset decoding rules ( a bit complicated ).


    Thought that would do.

    But while some files can be decompressed successfully, some cannot with errors such as

    Invalid Checksum :

    Decoding error at pos 152033 (block 0, sub 1, pos 152033)

    and some with "Error 66 : Decompression error : ERROR_decompressionFailed" ( on enwik

    So obviously somewhere else needs to be modified as well.

    Could anyone shed some lights on where to make changes?

    Thanks in advance.

  2. #2
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    889
    Thanks
    483
    Thanked 279 Times in 119 Posts
    A more complete source code package will probably prove helpful.
    There are too many ways such a modification could introduce flaws, so it requires paying attention to implementation details.

    Good news is, you have tests to quickly check the outcome.
    Last edited by Cyan; 5th June 2020 at 10:39.

  3. #3
    Member
    Join Date
    Apr 2015
    Location
    Greece
    Posts
    107
    Thanks
    37
    Thanked 29 Times in 20 Posts
    I had made a LZ4 variant with 32KB window with an additional smaller 2 byte match sequence(instead of normal 3). While it had better compression, decompression speed was like half as fast due to a single unpredictable branch. Your method will likely have the same performance penalty. (unpredictable branches are expensive).

  4. #4
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,572
    Thanks
    780
    Thanked 687 Times in 372 Posts
    Quote Originally Posted by algorithm View Post
    I had made a LZ4 variant with 32KB window with an additional smaller 2 byte match sequence(instead of normal 3). While it had better compression, decompression speed was like half as fast due to a single unpredictable branch. Your method will likely have the same performance penalty. (unpredictable branches are expensive).
    did you try to use CMOV and other branchless approaches?

  5. #5
    Member
    Join Date
    Feb 2016
    Location
    USA
    Posts
    98
    Thanks
    36
    Thanked 8 Times in 8 Posts
    Here is a simpler version: window size reduced to 32K ( cannot upload lz4.h, why .h extension is not allowed? ), and use 1+7 format for offset ( 1 byte for smaller offsets and 2 bytes for others ).

    So only changes are in lines 1011, 1016-1018 ( for encoding) and 1769, 1969 (for decoding).

    Works for pic, news, trans in calgarycorpus.

    Any other places need to be taken care of? ( Just for fast mode ).

    Quote Originally Posted by Cyan View Post
    A more complete source code package will probably prove helpful.
    There are too many ways such a modification could introduce flaws, so it requires paying attention to implementation details.

    Good news is, you have tests to quickly check the outcome.
    Attached Files Attached Files
    • File Type: c lz4.c (99.5 KB, 7 views)

  6. #6
    Member
    Join Date
    Feb 2016
    Location
    USA
    Posts
    98
    Thanks
    36
    Thanked 8 Times in 8 Posts
    Definitely welcome to change my modifications below to see if there are good improvements.

    Quote Originally Posted by Bulat Ziganshin View Post
    did you try to use CMOV and other branchless approaches?

  7. #7
    Member
    Join Date
    Feb 2016
    Location
    USA
    Posts
    98
    Thanks
    36
    Thanked 8 Times in 8 Posts
    True. my tests of the simpler version ( see attached lz4.c above ) shows about 2/3 of original decompression speed. But in many applications, decompression speed is already way higher than other components, so even of 2/3, that wouldn't be a problem. Thus there is value in improving compression ratio and speed, if improvement is significant. My simple version of course only shows about 1% to 2% gain in compression ratio, which is not significant in most applications. That is why I think it might be worthwhile to explore more gains.

    Quote Originally Posted by algorithm View Post
    I had made a LZ4 variant with 32KB window with an additional smaller 2 byte match sequence(instead of normal 3). While it had better compression, decompression speed was like half as fast due to a single unpredictable branch. Your method will likely have the same performance penalty. (unpredictable branches are expensive).

  8. #8
    Member
    Join Date
    Apr 2015
    Location
    Greece
    Posts
    107
    Thanks
    37
    Thanked 29 Times in 20 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    did you try to use CMOV and other branchless approaches?
    I have designed one using xors and 2 complements that emulate cmov but didn't implement.
    But the code is from 5 years ago.

  9. #9
    Member
    Join Date
    Apr 2015
    Location
    Greece
    Posts
    107
    Thanks
    37
    Thanked 29 Times in 20 Posts
    @smjohn1 You have only variable length offset. Your smaller size offset is only up to 127 distance. Compression ratio will not significantly improve. In my case the the smaller version had 2 bit LL, 2 bit ML and 11 bit offset. Also for fast modes 64KB window in LZ4 is too big. Having 4K hash table is the bottleneck in compression ratio.

    LZ4 is optimized for 32KB cache size. I think there can be an improvement in compression ratio if there were multiple versions for different cache sizes. Some mobile phones even have 128 KB L1. But i don't expect big improvement.

Similar Threads

  1. LZ4: LZ4_DISTANCE_MAX
    By smjohn1 in forum Data Compression
    Replies: 4
    Last Post: 26th May 2020, 19:12
  2. LZ4 chunks
    By Shelwien in forum Data Compression
    Replies: 10
    Last Post: 20th February 2019, 16:27
  3. Compiling lz4
    By smjohn1 in forum Data Compression
    Replies: 11
    Last Post: 3rd January 2018, 22:52
  4. Brotli literal and offset encoding
    By geza in forum Data Compression
    Replies: 10
    Last Post: 21st June 2016, 15:17
  5. Variable-length Offset
    By Cyan in forum Data Compression
    Replies: 8
    Last Post: 12th December 2008, 13:06

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •