Page 1 of 2 12 LastLast
Results 1 to 30 of 53

Thread: LZPM 0.09 is here!

  1. #1
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,985
    Thanks
    377
    Thanked 353 Times in 141 Posts

  2. #2
    Member Vacon's Avatar
    Join Date
    May 2008
    Location
    Germany
    Posts
    523
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Hello everyone,

    got it!
    Thank you Ilia

    Best regards!

  3. #3
    Tester

    Join Date
    May 2008
    Location
    St-Petersburg, Russia
    Posts
    182
    Thanks
    3
    Thanked 0 Times in 0 Posts
    thank!

  4. #4
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Fantastic! Thanks Ilia!

  5. #5
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,985
    Thanks
    377
    Thanked 353 Times in 141 Posts
    This version achieves about the same compression on ENWIKs as TC 5.1 dev 7, which uses my old variant of ROLZ and <u>an order-2 CM encoder</u> for ROLZ-output coding!

  6. #6
    Member
    Join Date
    Dec 2006
    Posts
    611
    Thanks
    0
    Thanked 1 Time in 1 Post
    Thanks! Has been tested
    __version______size_________in___out _ speed (kB/s)
    LZPM 0.06 __13 696 668___1 910 / 9 548
    LZPM 0.07 __13 581 851___1 053 / 9 866
    LZPM 0.08 __13 409 100____958 / 9 866
    LZPM 0.09 __13 400 783___1 028 / 9 866
    Both faster and stronger!

  7. #7
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts

  8. #8
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,985
    Thanks
    377
    Thanked 353 Times in 141 Posts
    Thank you!

  9. #9
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,985
    Thanks
    377
    Thanked 353 Times in 141 Posts
    Another compression improvement. New parsing scheme is more balanced and advanced than older one. I think it's crazy, taking into account how much conditions it checks. Still here is no actual cost function which calculates real cost of each choice, however, here I use some stuff to emulate cost function to skew output statistics and this one is far more advanced than used with QUAD.

    OK, some testing results with LZPM 0.10:

    world95.txt: 578,813 bytes

    bible.txt: 910,800 bytes

    ENWIK8: 27,904,249 bytes
    ENWIK9: 242,350,961 bytes

    3200.txt: 4,873,698 bytes

    PariahInterface.utx: 5,523,325 bytes


  10. #10
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    Have done a very good job with LZPM!

  11. #11
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,985
    Thanks
    377
    Thanked 353 Times in 141 Posts
    With latest versions just fight for each byte!

  12. #12
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,985
    Thanks
    377
    Thanked 353 Times in 141 Posts
    Brief description of current parsing

    The baseline flexible parsing is optimal in terms of number of matches - much like backwards parsing with QUANTUM. However, it's not optimal with adaptive encodings. In fact, in some cases it's cheaper to drop a single literal or a shorter match but with smaller distance etc. Anyway, the modern optimal parsing (dynamic programming) is not about the optimal number of matches but the optimal cost of each choice. With dynamic programming, we try to find the optimal choice for each position in a given block (ROLZ2 uses 4 KB blocks), the cheapest one. After that, we go backwards collecting units - this is the optimal. Currently, I don't know how this can fit into my LZPM. Anyway, with the latest versions I emulate the same things. As you know, the basic idea behind FP is:

    compare:

    current_match + followed_match

    with:
    for all variants of cutdown:
    cutdowned_current_match + followed_after_cutdowned

    If we find a greater sum, we cutdown current match in favor to get higher compression.

    WIth LZPM, instead of directly compare bare match lengths, I made a special cost function, which returns a "score" or value of each choice, based on:
    match length
    match distance
    is current unit a literal (literals has a high priority)

    so new scheme looks like this:
    getvalue(current_match, current_index) + getvalue(followed_match, followed_index)

    with:
    for all variants of cutdown:
    getvalue(cutdowned_match, cutdowned_match_index) + getvalue(followed_match1, followed_index1)

    Currently, I'm tunning getvalue() to achieve better results.


  13. #13
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,985
    Thanks
    377
    Thanked 353 Times in 141 Posts
    New results, LZPM 0.10:

    world95.txt: 576,824 bytes

    bible.txt: 906,818 bytes

    ENWIK8: 27,829,513 bytes
    ENWIK9: 241,694,333 bytes

    3200.txt: 4,862,377 bytes


  14. #14
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,985
    Thanks
    377
    Thanked 353 Times in 141 Posts
    Well, here is an additional parameter which applied to ROLZ only, so called "the table size". You can find a reference in RK's README. Anyway, check out the results with LZPM, the vanilla parsing scheme is the same with all modes, the decompression speed is about the same with all modes, although decoder needs some extra memory when dealing with a larger table. The disadvantage of a larger table is drastically affected compression speed - the compression time becomes 1.5-2X longer each time the table size doubles. Note that the memory usage at compression is the same with all modes.

    Results of LZPM 0.10 with ENWIK8:
    LZPM 0.10, 1K: 29,361,878 bytes
    LZPM 0.10, 2K: 28,507,021 bytes
    LZPM 0.10, 4K: 27,837,078 bytes (current)
    LZPM 0.10, 8K: 27,360,382 bytes
    LZPM 0.10, 16K: 27,083,501 bytes
    LZPM 0.10, 32K: 26,960,879 bytes
    LZPM 0.10, 64K: 26,914,552 bytes

    Having said that starting with 16K the compression speed becomes very slow, even on my Core 2 Duo.

    Probably, 8K is more than adequate.

    Look at the results with ENWIK9, for comparison:
    LZPM 0.10, 8K: 237,794,407 bytes


  15. #15
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Quote Originally Posted by encode
    Probably, 8K is more than adequate.
    I agree!

  16. #16
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,985
    Thanks
    377
    Thanked 353 Times in 141 Posts
    OK, some detailed results with ENWIK8:
    program, table size: compressed size, compression time, memory needed for decompression

    LZPM 0.10, 1K: 29,361,878 bytes, 29 sec, 17 MB
    LZPM 0.10, 2K: 28,507,021 bytes, 52 sec, 18 MB
    LZPM 0.10, 4K: 27,837,078 bytes, 104 sec, 20 MB
    LZPM 0.10, 8K: 27,360,382 bytes, 215 sec, 24 MB
    LZPM 0.10, 16K: 27,083,501 bytes, 423 sec, 32 MB

    To get approx. timings for ENWIK9 multiply it by 10.

    For Matt's machine, multiply it by 25!!!

    215 * 25 = 5375 sec !!
    423 * 25 = 10575 sec !!

    TOO SLOW!

    Anyway, new LZPM 0.10 even with 4K already introduces many improvements, including some code optimizations - i.e. speed improvements along with compression gain!


  17. #17
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,511
    Thanks
    746
    Thanked 668 Times in 361 Posts
    it's great to provide many options for users - from very fast compression to super-tight, so i suggest to add it to the program's parameters

  18. #18
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,985
    Thanks
    377
    Thanked 353 Times in 141 Posts
    Well, the table size is hard coded - it's better for LZPM's performance. But the main option should be choosing parsing method. Let's look at some results with various parsing strategies:

    ENWIK8, current LZPM 0.10:
    Greedy parsing: 30,581,042 bytes, 20 sec
    Lazy matching with 1-byte lookahead: 29,156,102 bytes, 48 sec
    Lazy matching with 2-byte lookahead: 28,759,282 bytes, 60 sec
    FP with 1-byte lookahead: 29,014,783 bytes, 45 sec
    FP with 2-byte lookahead: 28,538,462 bytes, 58 sec
    FP with full search: 27,832,683 bytes, 104 sec

    Current Flexible Parsing is OK, I guess. Also it should be noted that simplified FP with just one or two byte lookahead is supreme to an advanced Lazy Matching in both terms: speed and compression (NOTE: Lazy Matching used in this test takes into account the distance, or index in terms of ROLZ, of each match and provides a higher compression compared to the bare scheme).


  19. #19
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,511
    Thanks
    746
    Thanked 668 Times in 361 Posts
    with maximum table size, compression improved by 3%. i think such mode will be interesting for users (decompression speed is the same?)

  20. #20
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,985
    Thanks
    377
    Thanked 353 Times in 141 Posts
    Quote Originally Posted by Bulat Ziganshin
    decompression speed is the same?
    Yes, exactly the same, maybe even slightly faster (more matches = faster decompression). I just shocked about speed impact. IMHO, at some point LZPM with large table crosses the line. However, if I strictly reduce the hash chain lengths I can control speed over compression. Anyway, the larger table size gives improvement mainly on text and large files. On binary ones, like EXEs, there is no improvement, if at all...

  21. #21
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,985
    Thanks
    377
    Thanked 353 Times in 141 Posts
    world95.txt:
    LZPM 0.09: 579,933 bytes
    LZPM 0.10, 4K: 576,674 bytes
    LZPM 0.10, 8K: 574,887 bytes
    LZPM 0.10, 16K: 573,687 bytes

    bible.txt:
    LZPM 0.09: 913,315 bytes
    LZPM 0.10, 4K: 906,667 bytes
    LZPM 0.10, 8K: 899,516 bytes
    LZPM 0.10, 16K: 895,462 bytes

    3200.txt:
    LZPM 0.09: 4,898,392 bytes
    LZPM 0.10, 4K: 4,861,777 bytes
    LZPM 0.10, 8K: 4,752,334 byte
    LZPM 0.10, 16K: 4,679,430 bytes

    calgary.tar:
    LZPM 0.09: 873,452 bytes
    LZPM 0.10, 4K: 869,924 bytes
    LZPM 0.10, 8K: 867,494 bytes
    LZPM 0.10, 16K: 866,175 bytes


  22. #22
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,511
    Thanks
    746
    Thanked 668 Times in 361 Posts
    most of these files are hust too small. only 3200 has compressed size >1mb

  23. #23
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,985
    Thanks
    377
    Thanked 353 Times in 141 Posts
    Original sizes:
    world95.txt: 2,988,578 bytes
    bible.txt: 4,047,392 bytes
    3200.txt: 16,013,962 bytes
    calgary.tar: 3,152,896 bytes

    Later I will post results with larger files...

  24. #24
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,985
    Thanks
    377
    Thanked 353 Times in 141 Posts
    pak0.pak (183,997,730 bytes):
    LZPM 0.10, 4K: 90,071,415 bytes, 91 sec
    LZPM 0.10, 8K: 89,758,842 bytes, 138 sec
    LZPM 0.10, 16K: 89,564,352 bytes, 208 sec

    Redline.bgd (175,091,407 bytes):
    LZPM 0.10, 4K: 81,712,031 bytes, 68 sec
    LZPM 0.10, 8K: 81,528,811 bytes, 98 sec
    LZPM 0.10, 16K: 81,473,612 bytes, 149 sec

    As you can see, with non textual data, the speed penalty is no so huge.


  25. #25
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,985
    Thanks
    377
    Thanked 353 Times in 141 Posts
    Just want to ask LovePimple. Look at the results, what do you think? Worth it?

    Higher compression is good, I think that even with 16K LZPM can beat TurboLZ on ENWIK9. But I think LZPM may not cross the line regarding compression speed - it must stay usable for the most.

    Having said, I will do some additional tests with reduced hash chains - maybe I can gain compression with no speed lost...

  26. #26
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Quote Originally Posted by encode
    Just want to ask LovePimple. Look at the results, what do you think? Worth it?
    Possibly! Im curious as to how it will perform on slightly older machines, such as Bulats Duron and my own Sempron.

  27. #27
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,511
    Thanks
    746
    Thanked 668 Times in 361 Posts
    i swear tonever run lzpm in super-compression mode

  28. #28
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,985
    Thanks
    377
    Thanked 353 Times in 141 Posts
    Some results with reduced hash chain length:

    ENWIK8

    LZPM 0.10, 8k, chain len. 8: 28,749,962 bytes, 36 sec
    LZPM 0.10, 8k, chain len. 16: 28,168,227 bytes, 52 sec
    LZPM 0.10, 8k, chain len. 32: 27,785,277 bytes, 77 sec
    LZPM 0.10, 8k, chain len. 64: 27,565,020 bytes, 108 sec

    LZPM 0.10, 16k, chain len. 8: 28,692,673 bytes, 42 sec
    LZPM 0.10, 16k, chain len. 16: 28,068,018 bytes, 67 sec
    LZPM 0.10, 16k, chain len. 32: 27,634,668 bytes, 106 sec
    LZPM 0.10, 16k, chain len. 64: 27,365,878 bytes, 161 sec

    In addition, I have an idea about how to speed up the current scheme. If LZPM will have the same compression, but with higher compression speed it will be far more promising... The idea is to kick out all other LZ coders...

  29. #29
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Quote Originally Posted by encode
    In addition, I have an idea about how to speed up the current scheme. If LZPM will have the same compression, but with higher compression speed it will be far more promising...
    I agree!

  30. #30
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,985
    Thanks
    377
    Thanked 353 Times in 141 Posts
    OK, I decided to share with various versions of LZPM (4K, 8K, 16K).

    These are just DEBUG versions.

    I PROHIBIT TO POST RESULTS OF THESE VERSIONS ON OFFICIAL BENCHMARKS (I.E. LARGE TEXT COMPRESSION, BLACK_FOX'S TEST AND ALL OTHERS) POST IT ONLY HERE!!!

    Anyway, you can check out what I'm talking about!

    Here is a pack with all there executables:
    lzpm-dbg-pack.rar (54 KB)

    Enjoy!

Page 1 of 2 12 LastLast

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •