Page 1 of 3 123 LastLast
Results 1 to 30 of 80

Thread: lzpm 0.10 is here!

  1. #1
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    LZPM 0.10 is the first compressor with a modern flexible parsing ever. Thru QUAD and first versions of LZPM, I finally came up to how properly use such thing like FP with modern algorithms. I'm planning to make a small break with LZPM development, so at some point 0.10 is a final. Enjoy anyway!

    http://www.encode.su/lzpm/


  2. #2
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Thanks Ilia!

  3. #3
    Tester

    Join Date
    May 2008
    Location
    St-Petersburg, Russia
    Posts
    182
    Thanks
    3
    Thanked 0 Times in 0 Posts
    Yeah! Thanks!

  4. #4
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts

  5. #5
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Thank you!

  6. #6
    Member Vacon's Avatar
    Join Date
    May 2008
    Location
    Germany
    Posts
    523
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Hello everyone,

    Quote Originally Posted by encode
    so at some point 0.10 is a final. Enjoy anyway!
    Thank you Ilia!
    Still closed source?

    Best regards!

  7. #7
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Quote Originally Posted by Vacon
    Still closed source?
    It will be open source ONLY after I will finish working on it. Frequently, I make a small breaks, and this is a matter of tradition. Just currently I want to see the results at various benchmarks, and read the users feedback.

    As you can see, I started from version 0.01 Instead of 1.00. I just planned the LZPM as a long term project (1 to 2 years). Recently, Malcolm described to me at almost ALL details about his ROLZ2 algorithm. Many described ideas can be added to the LZPM, so at some point its only a beginning.

    Dont know whats next anyway. Lets wait and see...

  8. #8
    Member Vacon's Avatar
    Join Date
    May 2008
    Location
    Germany
    Posts
    523
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Hello everyone,

    Quote Originally Posted by encode
    It will be open source ONLY after I will finish working on it.
    Ok, but isnt the power of OpenSource resulting from the power of several people working on software?
    True, one (here: you!) wants to impress others by releasing code thats nearly done, but you would possibly get good support or new ideas ( -> positive effects).

    Quote Originally Posted by encode
    ...Many described ideas can be added to the LZPM, so at some point its only a beginning.
    Similar to my statement above. Only difference: additional ideas, plus Malcoms (allthough he surely has very good ones!).
    Your (impressing) program -> your decision. Ill respect it, only wanted to give my two cents
    Edit: Another reason why I asked: at first glance I didnt find a hint about license-type. Neither in readme, nor by starting the program. Maybe you should add something like: "Copyrighted freeware".

    Best regards!

  9. #9
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Quote Originally Posted by Vacon
    Ok, but isnt the power of OpenSource resulting from the power of several people working on software?
    Yes, thats true. For example, some people really helped me with QUAD (mainly with string searching optimizations, especially Uwe). However, mainly, this help is just a small hints, which I already knew, but just not implemented yet. Only I and Malcolm have ROLZ-based programs. Two things can be still improved: string searching (using binary tree, as Malcolm suggested) and parsing.


  10. #10
    Member Vacon's Avatar
    Join Date
    May 2008
    Location
    Germany
    Posts
    523
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Hello everyone,

    thanks for making that clear to me.

    Best regards!

  11. #11
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Just invented a new idea for LZPM, which leads to some compression and great speed improvement.

    With hash chains, LZPM searches for 1+MINMATCH = 4 byte string. This is much faster than if we will search for 3 or even 2 byte string. With latest LZPM I limited the distance for the MINMATCH to a tiny value - index range from 0...3. With getmatch() function during parsing we initially search for string starting from 4 bytes. The idea - eliminate index output for strings of MINMATCH long - i.e. if len == 3, index = 0. In other words the smallest matches will come only from the recent position.

    This gets some compression improvement, since we encode tiny matches more efficiently.

    So, here we can do the simplest check at the recent position for the MINMATCH. After, we start searching strings using hash chains, but here we search for 1+4 = 5 byte and longer strings. This thing gives serious speed up, since 5-byte strings are far more less frequent than 4-byte ones.

    Continue digging...


  12. #12
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Tested the idea.
    Well, on text files this gives nearly 1.5X speedup. In some cases this gives no speedup. But the thing with MINMATCH works anyway. Now LZPM's algo structure looks like:

    order-1 arithmetic for literals
    LZP for match lengths of three
    ROLZ for match lengths of four and longer

    In addition, this LZP trick may improve the decompression speed - no need in decode 12-bit match index.

    Some results for LZPM 0.11 (damn, I'm continue improving it )

    world95.txt: 575,157 bytes
    english.dic: 947,397 bytes

    acrord32.exe: 1,477,841 bytes
    mso97.dll: 1,888,074 bytes

    bible.txt: 905,853 bytes
    book1: 265,187 bytes

    PariahInterface.utx: 5,518,514 bytes

    ENWIK8: 27,829,639 bytes
    ENWIK9: 241,537,102 bytes

    A small, but stable and cool improvement...

  13. #13
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Yeah, the decompression became a little bit faster...

  14. #14
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    What about FP.log?

  15. #15
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Quote Originally Posted by LovePimple
    What about FP.log?
    New LZPM compresses it to 631,001 bytes, compression time is about the same even with initial 5-byte string search.

    Note that currently the initial 5-byte string is not added to the LZPM. Such thing was just tested. I must perform some additional tests and decide, include it or not. Like I said the byte-3 LZP already added.


  16. #16
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Quote Originally Posted by encode
    Note that currently the initial 5-byte string is not added to the LZPM. Such thing was just tested. I must perform some additional tests and decide, include it or not. Like I said the byte-3 LZP already added.
    OK!

  17. #17
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Additional plans for LZPM 0.11 (911 ):

    + Add NINE compression levels:
    1 - fastest (greedy parsing)
    2 - normal (FP with 1-byte lookahead)
    3 - normal (FP with 2-byte lookahead)
    ...
    8 - normal (FP with 7-byte lookahead)
    9 - max (FP with full search)

    + Reduce memory usage for compression to 181 MB with hash chains. If I will change the main search algorithm, possibly, I'll reduce memory usage further.

    + Add 8K ROLZ model instead of 4K. So LZPM will need 24 MB for decompression.


  18. #18
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Its looking good so far!

  19. #19
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    At least with different compression levels the user can choose between compression and speed. With max compression LZPM becomes notable slower, at the same time the compression in many cases just slightly improved. The max level means max - with no compromise, so we can spend an extra time. Current 280 MB is too heavy. With my ROLZ implementation hash chains eats more memory compared to the classic LZ77 ones. Current LZPM uses too large hash, we can use a smaller value. With classic hash chains, the hash size does not affect the compression - the algorithm will produce just a longer hash chains, of course I'm talking about hash chains which are not limited in length. With my algo, hash chains are limited by the match index, so a smaller hash can affect compression, however, this is a tiny effect. Reducing memory to 181 MB (nearly 100 MB shave off) we will get about the same compression. With 8K ROLZ index the affect should be even smaller.


  20. #20
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    I really like the idea of having user selectable super-fast > super-high compression in one handy compression tool.

  21. #21
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    I wanted to know if we can implement with a jam method in which it is possible to insert a predictor of the length of the match ! What of tasks is possible?

  22. #22
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    It's already done with LZPM and QUAD. Match lengths are coded with a context. Such approach gives some compression gain compared to the plain match length coding - i.e. order-0. Some early implementations of LZP uses the delta coding for matches - i.e. we encode the difference between the last match seen in the current context and the current match.


  23. #23
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Well, LZPM as all LZ coders has the main parameter - the dictionary size. Currently, LZPM uses 16 MB dictionary, why not to use 32 MB? Since I already increased the index size. Will post some results later.

  24. #24
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Quote Originally Posted by encode
    Will post some results later.
    I Look forward to it!

  25. #25
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Okay, the results. Description of two versions:
    LZPM 16m: LZPM 0.11 with 16 MB buffer and 22-bit hash, memuse: 176 MB for compression and 24 MB for decompression
    LZPM 32m: LZPM 0.11 with 32 MB buffer and 23-bit hash, memuse:
    352 MB for compression and 40 MB for decompression
    From both versions I removed the EXE-filter. Also both versions are tested with the "9" switch (max compression).

    ENWIK8
    LZPM 16m: 27,397,798 bytes
    LZPM 32m: 27,094,740 bytes

    ENWIK9
    LZPM 16m: 237,966,696 bytes
    LZPM 32m: 235,126,638 bytes

    fp.log
    LZPM 16m: 628,707 bytes
    LZPM 32m: 615,286 bytes

    gimp-2.0.0.tar
    LZPM 16m: 9,939,384 bytes
    LZPM 32m: 9,836,951 bytes

    PariahInterface.utx
    LZPM 16m: 5,456,465 bytes
    LZPM 32m: 5,446,342 bytes

    pak0.pak
    LZPM 16m: 89,665,008 bytes
    LZPM 32m: 89,363,492 bytes

    Suggestions and opinions are welcomed!

  26. #26
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    LZPM 32m is interesting!

  27. #27
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Also please speak what kind of LZPM you want to see:
    - "I want LZPM with 1 GB buffer"
    - "I want the smiley illuminating on LZPM's banner"
    - "I hate LZPM"
    - "...<censored>..."
    and so on. The ideas straight. I just need to decide.

    Probably I should:
    + Remove EXE transformer for slightly higher decompression speed and slightly higher non-executable compression
    + Use 32 MB buffer with 8K ROLZ index

    LZPM with such settings is SLOW. However, you have nine compression levels to choose from - i.e. nobody force you to compress with "9" (max) level.

    Just keep LZPM clean and powerhouse like.

  28. #28
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Quote Originally Posted by encode
    LZPM with such settings is SLOW. However, you have nine compression levels to choose from - i.e. nobody force you to compress with "9" (max) level.
    Exactly! Slow speed doesnt worry me as long as there are faster levels to chose from when needed.


    Quote Originally Posted by encode
    Probably I should:
    + Remove EXE transformer for slightly higher decompression speed and slightly higher non-executable compression
    + Use 32 MB buffer with 8K ROLZ index
    Im not sure about these without seeing some results. For the moment I dont like the idea of losing the EXE transformer.

  29. #29
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Results for LZPM with 64 MB buffer and 16K index:

    ENWIK8: 26,498,178 bytes


  30. #30
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Its looking good!

Page 1 of 3 123 LastLast

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •