Results 1 to 30 of 30

Thread: LZPM 0.04 is here!

  1. #1
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Link:
    lzpm004.zip (26 KB)

    Enjoy!


  2. #2
    Member
    Join Date
    Dec 2006
    Posts
    611
    Thanks
    0
    Thanked 1 Time in 1 Post
    Thanks!

  3. #3
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Awesome!!! Thanks Ilia!

  4. #4
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Quick test...

    LZPM 0.03:

    A10.jpg > 832,328
    AcroRd32.exe > 1,634,410
    english.dic > 1,015,867
    FlashMX.pdf > 3,768,999
    FP.LOG > 800,097
    MSO97.DLL > 2,012,471
    ohs.doc > 836,869
    rafale.bmp > 1,103,015
    vcfiu.hlp > 725,726
    world95.txt > 617,266

    Total = 13,347,048 bytes


    ENWIK8 > 29,248,641 bytes


    LZPM 0.04:

    A10.jpg > 832,328
    AcroRd32.exe > 1,634,410
    english.dic > 1,015,867
    FlashMX.pdf > 3,768,999
    FP.LOG > 799,963 ***
    MSO97.DLL > 2,012,471
    ohs.doc > 836,869
    rafale.bmp > 1,103,015
    vcfiu.hlp > 725,726
    world95.txt > 617,266

    Total = 13,346,914 bytes


    ENWIK8 > 29,297,905 bytes

  5. #5
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Thank you!

  6. #6
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Another quick test...

    Test Machine = AMD Sempron 2400+

    Test File = "lzpmtest.tar" (574,381,056 bytes) contains exe, dll, txt, html, ini, bmp, wav, etc.

    Results:

    LZPM 0.03:

    Ratio = 42.2%
    Compression Time = 00:10:40.625
    Compressed Size = 242,253,866 bytes


    LZPM 0.04:

    Ratio = 42.19%
    Compression Time = 00:09:32.985
    Compressed Size = 242,320,379 bytes

  7. #7
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    0.04 is a little larger and slower than 0.02 on enwik8/9.
    http://cs.fit.edu/~mmahoney/compression/text.html# 2545

  8. #8
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    The goal of 0.04 is introduce the brand new circular hash chains. If at all, match finder based on hash chains is slower on text files whilst much faster on other files such as binary and compressed data. In addition, in some cases hash chains are able to provide better match search (world95.txt, etc.). Note that LZPM is not intended to provide the best performance on ENWIK8/9.

    From a next version (0.05) I'll add an improved lazy matching (as proposed by Malcolm Taylor) which gives better compression, especially better text compression, at the cost of some compression time. Anyway, it's worth it. I'll post some digits later.


  9. #9
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    OK, that explains it. Also, I just noticed that 0.03 uses more memory, so it has the top spot now
    http://cs.fit.edu/~mmahoney/compression/text.html# 2544

  10. #10
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Thanks for testing!

    LZPM gets much more benefits from optimized parsing copared to the QUAD. However, the speed is also really affected - simply compare the QUAD with default mode and LZPM 0.04 - its both uses the same parsing scheme. So, I guess, adding Flexible Parsing to the LZPM is unpractical. Anyway, we can use something middle between lazy matching and flexible parsing - an improved lazy matching.

    Check out how parsing affects the compression:

    world95.txt (2,988,578 bytes):
    greedy: 658,452 bytes
    lazy1: 617,266 bytes (current)
    lazy2: 603,251 bytes

    calgary.tar (3,152,896 bytes):
    greedy: 929,462 bytes
    lazy1: 899,738 bytes (current)
    lazy2: 891,079 bytes

    A nice gain, isn't it?

    And with new improved parsing (lazy2) LZPM compresses:
    ENWIK8 to 28,896,680 bytes
    ENWIK9 to 251,111,835 bytes

    With lazy2 scheme speed is not really affected. However, in some cases compression can be slightly worser. Currently, I'm thinking on implementation details (optimizations, etc.) and about the theory. For example, why in some cases new scheme provides less compression, why on some files (canterbury corpus) lazy matching provides better results compared to the Flexible Parsing, etc.


  11. #11
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,501
    Thanks
    741
    Thanked 664 Times in 358 Posts
    im interested in using this scheme in tornado. can you explain details?

    Quote Originally Posted by encode
    .) and about the theory. For example, why in some cases new scheme provides less compression, why on some files (canterbury corpus) lazy matching provides better results compared to the Flexible Parsing, etc.
    both schemes doesnt make strong guarantees. they are just heuristics, after all

  12. #12
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Quote Originally Posted by Bulat Ziganshin
    im interested in using this scheme in tornado. can you explain details?
    All simple. With QUAD/LZPM 0.04 I use lazy matching with one byte lookahead. With the new LZPM 0.05 Im experimenting with two bytes lookahead. No more no less. As Malcolm said, he often uses lazy matching with two bytes lookahead.


  13. #13
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,501
    Thanks
    741
    Thanked 664 Times in 358 Posts
    i don't understand what you mean here. can you show this as pseudocode?

  14. #14
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Hm, okay.

    curmatch = getmatch(pos); // get current match

    if (getmatch(pos + 1) > curmatch) // one byte lookahead
    curmatch = 0; // discard current match

    // ...

    if ((getmatch(pos + 1) > curmatch) || (getmatch(pos + 2) > curmatch)) // two bytes lookahead
    curmatch = 0; // discard current match


  15. #15
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Quote Originally Posted by Malcolm Taylor (comp.compression)
    Lazy matching is just a great heuristic to get better compression
    without sacrificing too much time. Usually you look at the next one or
    two bytes and check for matches. If these matches are longer by some
    delta (possibly one byte) then you take them instead. This works for lz77.
    ...
    > By "take them instead" Im assuming you output a literal, then move to
    > the next position and start the process over again, yes?


    Or perhaps a small match. If you look two ahead (as often happens) then
    it might be that you can output a length 2 match and then your really
    good match.

  16. #16
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,501
    Thanks
    741
    Thanked 664 Times in 358 Posts
    thanks, now it's clear. it needs more time than usual lazy matching, but may be useful for highest modes

  17. #17
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Quote Originally Posted by Bulat Ziganshin
    but it needs more time than usual lazy matching
    Of course...

  18. #18
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,501
    Thanks
    741
    Thanked 664 Times in 358 Posts
    but why you said "With lazy2 scheme speed is not really affected."?

  19. #19
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Quote Originally Posted by Bulat Ziganshin
    but why you said "With lazy2 scheme speed is not really affected."?
    Because, if at all, in my implementation speed indeed is not so much affected. Again, the impact depends on data, but overall, compression becomes just slightly slower. In other words compression gain is worth it.


  20. #20
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Okay, some testing results. At this time I gonna check the difference between flexible parsing and a new lazy matching - to see how many air new approach keeps inside compressed files.

    fp.log:
    fp: 682,006 bytes
    lazy2: 797,806 bytes

    world95.txt:
    fp: 590,190 bytes
    lazy2: 603,251 bytes

    acrord32.exe:
    fp: 1,625,916 bytes
    lazy2: 1,630,322 bytes

    calgary.tar:
    fp: 888,055 bytes
    lazy2: 891,079 bytes

    All in all, new scheme works pretty well. It must me said that here I used the simplified version of Flexible Parsing, so, a little bit higher compression is still possible. Anyway, as you can see new lazy matching keeps not too much air being MUCH faster compared to the FP.


  21. #21
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Another results with FP:
    ENWIK8: 28,567,778 bytes
    ENWIK9: 247,950,599 bytes

    However, check out results with canterbury.tar:
    fp: 545,035 bytes
    lazy2: 540,426 bytes

    Here, FP is fired.

    Another thing, I have idea to switch LZPM's name to LZCOMP (lzcomp 0.05). As easy-to-read name.


  22. #22
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Results are looking promising!

    Quote Originally Posted by encode
    Another thing, I have idea to switch LZPMs name to LZCOMP (lzcomp 0.05). As easy-to-read name.
    I prefer the name "LZPM".

  23. #23
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Quote Originally Posted by LovePimple
    I prefer the name "LZPM".
    Maybe LZPM is indeed better. LZPM stands for LZ77-PM.

    Quote Originally Posted by LovePimple
    Results are looking promising!
    Note that lzpm 0.05 will NOT use FP. Im just experimenting/testing...

  24. #24
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Quote Originally Posted by encode
    Note that lzpm 0.05 will NOT use FP. Im just experimenting/testing...
    I understand!

  25. #25
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    By the way, the release is soon! This weekend I'm expecting the MC update. Even if this not happen I release a new version next week!


  26. #26
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    lzpm 0.05 at sfc
    A10.jpg: 832,282 bytes
    acrord32.exe: 1,630,322 bytes (e8e9: 1,474,521 bytes)
    english.dic: 1,031,885 bytes
    FlashMX.pdf: 3,766,309 bytes
    fp.log: 797,806 bytes
    mso97.dll: 2,010,427 bytes (e8e9: 1,906,772 bytes)
    ohs.doc: 836,570 bytes
    rafale.bmp: 1,094,750 bytes
    vcfiu.hlp: 715,795 bytes
    world95.txt: 603,251 bytes

    Total: 13,319,397 bytes

  27. #27
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    LZPM 0.04 has been tested at MC! Very cool!

    I like its performance, look at the MFC results:

    67.34%, comp: 118 sec, dec: 27 sec.

    In other words, it outperforms QUAD, PIM and, to be honest LZPM is my favorite now!

    Next version will have a bit higher compression, keeping same decompression speed (and compatibility, by the way, all versions of LZPM since 0.02 are fully compatible!) and next version must be pretty cool! I think I can make LZPM a bit slower at the cost of some higher (especially text) compression.

    Keep in mind that LZPM has no filters. Even E8/E9 not included. QUAD and especially PIM have preprocessing...

    Anyway, the main catch is fast decompression - the fastest from my all software. Also, I think I have no idea for decompressor improvement, that's why I keep compatibility so long time.


  28. #28
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Quote Originally Posted by encode
    Next version will have a bit higher compression, keeping same decompression speed (and compatibility
    Will this higher compression will be available with the release of version 0.05?

  29. #29
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Quote Originally Posted by LovePimple
    Will this higher compression will be available with the release of version 0.05?
    Yes, like I posted before LZPM 0.05 will have "lazy2" parsing.

    But thats not all that Im planning with LZPM. Probably, I will add the binary tree match finder or an improved hash chains, although current match finder is pretty good.

    Anyway, the main target is a parsing scheme with cost function. I will try to implement FP with pricing, or even true optimal parsing with/or without pricing. There is also a variant like Uwe did with his brilliant ALZ (AFAIK) – some sort of lazy matching with price function, for better choice making based on a real literal/match cost.

    To be continued...


  30. #30
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    The next release should be quite a powerhouse!

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •