Results 1 to 14 of 14

Thread: LZPM v0.17 is here! (LZ77+ARI+Optimal Parsing)

  1. #1
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,013
    Thanks
    404
    Thanked 403 Times in 153 Posts

    Cool LZPM v0.17 is here! (LZ77+ARI+Optimal Parsing)

    Hi all!

    Please welcome my (new) LZ77 compressor!
    It's basically a baseline LZ77 - no buffered (rep) offsets, no literal masking, no exe-filters, whistles and bells. On some files it could be very efficient on other ones - not. On ENWIKs it's not the best performer, but on files like fp.log or english.dic LZPM is pretty good (for pure LZ77).
    Window size = block size = 128 MB.

    Enjoy new release!
    Attached Files Attached Files

  2. Thanks (10):

    comp1 (31st March 2020),cZar (9th May 2020),hexagone (31st March 2020),hmdou (5th April 2020),jibz (31st March 2020),Lucas (13th April 2020),Nania Francesco (1st April 2020),Sergey3695 (31st March 2020),Simorq (4th April 2020),SolidComp (1st April 2020)

  3. #2
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    353
    Thanks
    131
    Thanked 54 Times in 38 Posts
    What is it for? Is this a reference implementation, a design exercise?

  4. #3
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,013
    Thanks
    404
    Thanked 403 Times in 153 Posts
    I've done some experiments with LZ77 offset (distance) context coding. Decided to share the results + it's my first decent and publicly available "LZ77+ARI+Optimal Parsing" compressor. Previous program was LZPM v0.16 (back in 2008 ) - not the right or competitive implementation. For me, as a data compression author it is a must have.

    Quick results on "fp.log":
    Code:
    Original              -> 20,617,071 bytes
    NLZM v1.02 -window:27 -> 858,733 bytes
    Tornado v0.6 -16      -> 807,692 bytes
    ZSTD v1.4.4 -19       -> 805,203 bytes
    LZMA v19.00 -fb273    -> 707,646 bytes
    MComp v2.00 -mlzx     -> 696,338 bytes
    LZPM v0.17            -> 685,952 bytes

  5. Thanks (7):

    BLACKFIRE69 (23rd May 2020),Bulat Ziganshin (21st April 2020),hunman (1st April 2020),JamesB (2nd April 2020),load (13th April 2020),Simorq (4th April 2020),SolidComp (3rd April 2020)

  6. #4
    Member
    Join Date
    Feb 2015
    Location
    United Kingdom
    Posts
    176
    Thanks
    28
    Thanked 74 Times in 44 Posts
    Very impressive work Ilya!
    ​It's nice to see progress in the lz space.

  7. Thanks:

    encode (13th April 2020)

  8. #5
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,013
    Thanks
    404
    Thanked 403 Times in 153 Posts
    BWT, the work on the program in progress - I already improved the decompression speed (branchless RC operations for unencoded bits (p=0.5))

  9. Thanks:

    danlock (13th April 2020)

  10. #6
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    876
    Thanks
    242
    Thanked 325 Times in 198 Posts
    Quote Originally Posted by encode View Post
    I've done some experiments with LZ77 offset (distance) context coding. Decided to share the results + it's my first decent and publicly available "LZ77+ARI+Optimal Parsing" compressor. Previous program was LZPM v0.16 (back in 2008 ) - not the right or competitive implementation. For me, as a data compression author it is a must have.

    Quick results on "fp.log":
    Code:
    Original              -> 20,617,071 bytes
    NLZM v1.02 -window:27 -> 858,733 bytes
    Tornado v0.6 -16      -> 807,692 bytes
    ZSTD v1.4.4 -19       -> 805,203 bytes
    LZMA v19.00 -fb273    -> 707,646 bytes
    MComp v2.00 -mlzx     -> 696,338 bytes
    LZPM v0.17            -> 685,952 bytes
    How does it compare with brotli --large_window 25 --quality 11 ?

  11. #7
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,013
    Thanks
    404
    Thanked 403 Times in 153 Posts
    How can I find the official Win32/Win64 binary?

  12. Thanks:

    hunman (14th April 2020)

  13. #8
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,013
    Thanks
    404
    Thanked 403 Times in 153 Posts

  14. #9
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,013
    Thanks
    404
    Thanked 403 Times in 153 Posts
    Quote Originally Posted by Jyrki Alakuijala View Post
    How does it compare with brotli --large_window 25 --quality 11 ?
    Code:
    brotli --large_window=25 --quality=11 fp.log result -> 742,328 bytes

  15. #10
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    1,026
    Thanks
    103
    Thanked 410 Times in 285 Posts
    lzpm v0.17:
    25,450,555 bytes, 88.156 sec. - 1.398 sec., enwik8
    221,642,311 bytes, 1,004.171 sec. - 11.919 sec., enwik9
    2,176,296,216 bytes, 9,667.514 sec. - 116.401 sec., enwik10

  16. Thanks:

    encode (15th April 2020)

  17. #11
    Member
    Join Date
    Feb 2015
    Location
    United Kingdom
    Posts
    176
    Thanks
    28
    Thanked 74 Times in 44 Posts
    If I understand lzpm correctly this is purely context modelling improvements right?
    I'll take a wild guess: hierarchical context modelling of the raw offset bits?

    ​

  18. #12
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,013
    Thanks
    404
    Thanked 403 Times in 153 Posts
    Well, it's much simpler - LZPM make use of greater number of position slots vs standard 6-bit value (64 slots). This leads to more precise coding of higher bits of a distance (offset). However, on regular text files it takes no advantage - that's why LZPM poorly compresses ENWIKs.
    The context is simply a match length - a pretty standard and generic thing. This helps catch a match length vs. match distance correlation.
    Now I'm experimenting with the lower bits of a distance coding. All in all, the distance value could be huge - so we use "divide and conquer" thing here.
    By the LZPM format, the distance (window size) is restricted to ~4 GB. So I'm thinking to add the "Block/Window Size" option. The memory usage for compression is (9*BlockSize) + 64 MB + Arith model. For decoding it's BlockSize + Arith model only.
    As a note - I cannot use complex CM stuff here - this will kill the decompression speed. I'm trying to be on par with LZMA complexity. However, LZPM has a higher precision RC/Arith model - try to compress A10.jpg...


  19. Thanks:

    Lucas (16th April 2020)

  20. #13
    Member
    Join Date
    Feb 2015
    Location
    United Kingdom
    Posts
    176
    Thanks
    28
    Thanked 74 Times in 44 Posts
    That sounds similar to what I had in mind.
    Thank you for sharing!
    Btw in my lz endeavours I found very large gains (30-40% higher compression) on binary data by implementing lzma's matchByte model. It doesn't even need to use statistics to give gains, which should keep speed up.
    Good luck with lzpm! I like where it's heading

  21. Thanks:

    encode (16th April 2020)

  22. #14
    Member
    Join Date
    Apr 2020
    Location
    Earth
    Posts
    1
    Thanks
    0
    Thanked 1 Time in 1 Post
    Code:
    enwik8
    Compressing 'enwik8':
    ​100,000,000 -> 25,450,555 in 140.435s
    
    Decompressing 'enwik8.lzpm':
    25,450,555 -> 100,000,000 in 2.764s
    Code:
    acrord32.exe
    Compressing 'acrord32.exe':
    3,870,784 -> 1,576,824 in 2.726s
    
    Decompressing 'acrord32.exe.lzpm':
    1,576,824 -> 3,870,784 in 0.119s
    Code:
    english.dic
    Compressing 'english.dic':
    4,067,439 -> 778,491 in 2.024s
    
    Decompressing 'english.dic.lzpm':
    778,491 -> 4,067,439 in 0.072s
    Last edited by turpro; 17th April 2020 at 05:09.

  23. Thanks:

    encode (17th April 2020)

Similar Threads

  1. BALZ with Optimal Parsing
    By encode in forum Data Compression
    Replies: 5
    Last Post: 18th March 2020, 15:08
  2. O(n) time optimal parsing, broken and useless
    By nburns in forum Data Compression
    Replies: 37
    Last Post: 16th January 2017, 06:50
  3. Optimal Preprocessing/Parsing for LZ?
    By comp1 in forum Data Compression
    Replies: 68
    Last Post: 11th February 2015, 19:27
  4. [LZ] Optimal parsing
    By Bulat Ziganshin in forum Data Compression
    Replies: 14
    Last Post: 19th March 2014, 21:56
  5. optimal parsing
    By flounder in forum Data Compression
    Replies: 10
    Last Post: 3rd August 2008, 13:07

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •