Results 1 to 29 of 29

Thread: LZPM 0.08 is here!

  1. #1
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,985
    Thanks
    377
    Thanked 353 Times in 141 Posts
    Why wait?

    Enjoy new release now!

    http://www.encode.su/lzpm/index.htm


  2. #2
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Quote Originally Posted by encode
    Why wait?
    Exactly!

    Thanks Ilia!

  3. #3
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Quick test...

    A10.jpg > 832,221
    AcroRd32.exe > 1,481,357
    english.dic > 955,049
    FlashMX.pdf > 3,760,791
    FP.LOG > 643,043
    MSO97.DLL > 1,892,075
    ohs.doc > 830,167
    rafale.bmp > 1,067,373
    vcfiu.hlp > 691,677
    world95.txt > 584,426

    Total = 12,738,179 bytes


    On my machine LZPM compression times were too slow to keep retesting.


    e.g.

    First run (uncached) for LZPM v0.08 compressed FP.log to 643,043 bytes in 000:00:13:16.542 (796.542 Seconds).

    First run (uncached) for LPAQ1 (7) compressed FP.log to 402,796 bytes in 000:00:02:47.986 (167.986 Seconds).

    First run (uncached) for CCMx v1.23 (c 5) compressed FP.log to 437,856 bytes in 43.322 Seconds.

    First run (uncached) for QUAD v1.12 (-x) compressed FP.log to 619,701 bytes in 22.399 Seconds.

    First run (uncached) for QUAD v1.12 compressed FP.log to 717,207 bytes in just 3.379 Seconds.

  4. #4
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,985
    Thanks
    377
    Thanked 353 Times in 141 Posts
    fp.log represents a corner case. This file contains a lots of long matches and Flexible Parsing tries all variants within a match to find the best choice.

    For comparison, on my machine LZPM compresses fp.log within 65 seconds.


  5. #5
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,985
    Thanks
    377
    Thanked 353 Times in 141 Posts
    The catch in decompression. LZPM decompresses fp.log in less than a half second.

    If decompression was the same as compression of course I never allow such speeds. But getting extra compression with no decompression speed loss is good idea. In practice, you compress a file just one time and decompress it many, many times.


  6. #6
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Quote Originally Posted by encode
    For comparison, on my machine LZPM compresses fp.log within 65 seconds
    Clearly that is a very reasonable compression time.

    Perhaps someone can explain why LZPM compression times are more than 12x faster on your machine than they are on my Sempron 2400+?

  7. #7
    Member
    Join Date
    Jan 2007
    Location
    Moscow
    Posts
    239
    Thanks
    0
    Thanked 3 Times in 1 Post
    First times - compression, second - decompression.

    nero.exe
    36,003,840

    lzpm08
    7,064,254

    Process Time = 122.203 = 00:02:02.203 = 99%
    Global Time = 122.328 = 00:02:02.328 = 100%

    Process Time = 1.921 = 00:00:01.921 = 75%
    Global Time = 2.546 = 00:00:02.546 = 100%


    rar -m5
    6,779,595

    Process Time = 17.375 = 00:00:17.375 = 166%
    Global Time = 10.437 = 00:00:10.437 = 100%

    Process Time = 1.593 = 00:00:01.593 = 82%
    Global Time = 1.922 = 00:00:01.922 = 100%


    cabarc lzx:21
    7,578,441

    Process Time = 33.296 = 00:00:33.296 = 99%
    Global Time = 33.359 = 00:00:33.359 = 100%

    Process Time = 0.546 = 00:00:00.546 = 63%
    Global Time = 0.859 = 00:00:00.859 = 100%

    ==========================================

    WINWORD.EXE
    Microsoft Office Word 11.0.6568
    12,061,896

    lzpm08
    5,449,623

    Process Time = 12.734 = 00:00:12.734 = 100%
    Global Time = 12.719 = 00:00:12.719 = 100%

    Process Time = 1.687 = 00:00:01.687 = 89
    Global Time = 1.891 = 00:00:01.891 = 100


    rar -m5
    5,340,035

    Process Time = 11.109 = 00:00:11.109 = 175%
    Global Time = 6.328 = 00:00:06.328 = 100%

    Process Time = 0.609 = 00:00:00.609 = 99%
    Global Time = 0.610 = 00:00:00.610 = 100%


    cabarc lzx:21
    5,311,938

    Process Time = 18.328 = 00:00:18.328 = 100%
    Global Time = 18.328 = 00:00:18.328 = 100%

    Process Time = 0.265 = 00:00:00.265 = 106%
    Global Time = 0.250 = 00:00:00.250 = 100%

  8. #8
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,985
    Thanks
    377
    Thanked 353 Times in 141 Posts
    Quote Originally Posted by LovePimple
    Perhaps someone can explain why LZPM compression times are more than 12x faster on your machine than they are on my Sempron 2400+?
    The key not is only in a faster CPU (Intel Core 2 Duo 2.40 GHz) but also in a faster memory - 2 GB DDR2 @ 800 MHz is a self explanatory.


  9. #9
    Member
    Join Date
    Jan 2007
    Location
    Moscow
    Posts
    239
    Thanks
    0
    Thanked 3 Times in 1 Post
    Small L1 cache causes much cache misses, thus CPU has to use slower L2 cache, or even RAM. IMHO

  10. #10
    Member
    Join Date
    Jan 2007
    Location
    Moscow
    Posts
    239
    Thanks
    0
    Thanked 3 Times in 1 Post
    On dualcore Pentium D 945 @ 3.4 GHz with 2 x 2048 KBytes L2 cache and dualchannel DDR2 @ 266.7 MHz it took 284 seconds to compress fp.log and 0.6 second to decompress.
    Core 2 Duo rules!

  11. #11
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,985
    Thanks
    377
    Thanked 353 Times in 141 Posts
    I think that modern games just force users to purchase a new hardware! And one of the reasons of purchasing my new PC was a F.E.A.R. game.

  12. #12
    Member
    Join Date
    Dec 2006
    Posts
    611
    Thanks
    0
    Thanked 1 Time in 1 Post
    Only optimisations modern games get are optimisations for income...

  13. #13
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,475
    Thanks
    26
    Thanked 121 Times in 95 Posts
    according to: http://agner.org/optimize/microarchitecture.pdf core 2 has:
    The level-1 data cache is 32 kB, dual port, 8 way, 64 byte line size. The level-1 code cache
    has the same size as the data cache.
    There is one level-1 cache for each core, while the level-2 cache and bus interface unit is
    shared between the cores. The level-2 combined cache is 2 or 4 MB, 16 ways.
    so l1 cache of core 2 duo is 2 times smaller than l1 cache of athlon or duron or sempron. but athlons l1 cache is only 2 way, and l2 cache 8 way. core 2 has much larger l2 cache than current athlons. core 2 has also much better cache logic, ie. it more wisely decides which cache lines should be kept and which should be overwritten.

    in short, core 2 is a completely different architecture from k8/ k9 and benchmark results are likely to be much different than on athlons (speed- wise).

    k8 architecture put emphasis on memory read/ write speed while core 2 emphasizes on better use of l2 cache.

  14. #14
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Thanks to everyone for attempting to explain. I really didnt think that the difference in hardware was enough to show the 12x plus speed difference. Maybe 5x or 6x I could understand, but not 12x plus!!!

    It would be interesting if more people were to post their benchmark timings and hardware spec for direct comparison.


    Quote Originally Posted by nimdamsk
    On dualcore Pentium D 945 @ 3.4 GHz with 2 x 2048 KBytes L2 cache and dualchannel DDR2 @ 266.7 MHz it took 284 seconds to compress fp.log and 0.6 second to decompress.
    Why is the compression time so much slower than the 65 seconds of Ilias 2.4GHz Core 2 Duo machine?

  15. #15
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,985
    Thanks
    377
    Thanked 353 Times in 141 Posts
    IMHO, the main catch in RAM. 266.7 MHz vs. 800 MHz. And again, Core 2 Duo is supreme to the Pentium D anyway.

    Why faster memory can make the difference?
    LZPM uses 256 MB to store hash chains:

    128 MB for "HEAD"
    and
    128 MB for "PREV"

    During searching, firstly we access the head to find first (latest) entry. After that, we traverse over the "PREV", finding previous occurrences of the current string. But here we traverse with a large back-jumps over the memory. The faster memory access for sure plays a huge role. I guess with LZPM faster memory is more important than a larger L2 cache or something since we must have a fast random access to the 256+16 MB of memory. Larger cache can benefit with decompression, since such things like literal and other models can exactly fit in this cache.


  16. #16
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,985
    Thanks
    377
    Thanked 353 Times in 141 Posts

  17. #17
    Member
    Join Date
    Dec 2006
    Posts
    611
    Thanks
    0
    Thanked 1 Time in 1 Post
    Tested
    __version______size_________in___out _ speed
    LZPM 0.06 __13 696 668___1 910 / 9 548
    LZPM 0.07 __13 581 851___1 053 / 9 866
    LZPM 0.08 __13 409 100____958 / 9 866 (kB/s)

  18. #18
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,985
    Thanks
    377
    Thanked 353 Times in 141 Posts
    Thanks for testing!

    Looks like LZPM brings itself to a new level totally outperforming QUAD!


  19. #19
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts

  20. #20
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,985
    Thanks
    377
    Thanked 353 Times in 141 Posts
    Thank you, Matt!

    New trick with parsing leads more time penalty than I can expect. Also the CALL translator gets a few seconds at decompression, anyway the record is kept.

    Maybe in next versions I will reduce the memory usage, due to slim down the HEAD structure.

    Currently, I hash the 32-bit value (4 bytes) to the 24-bit hash.

    I already tried 20...23 bit hashes. Generally speaking, compression stayed the same - sometimes speeding up sometimes slowing down the compression but just a little bit.

    Looks like LZMA uses 20-bits for hash4 values.

    If I change the hash size, LZPM will use:
    20-bit hash: 128 + 8 MB + 16 MB = 152 MB
    21-bit hash: 128 + 16 MB + 16 MB = 160 MB
    22-bit hash: 128 + 32 MB + 16 MB = 176 MB (preferable)
    23-bit hash: 128 + 64 MB + 16 MB = 208 MB
    ...
    24-bit hash: 128 + 128 + 16 MB = 272 MB (current)

    Probably, the N/4 is the optimal (22-bits).


  21. #21
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,985
    Thanks
    377
    Thanked 353 Times in 141 Posts
    Testing results:

    ENWIK8
    20-bit hash: 28,302,346 bytes, 106 sec (152 MB mem use)
    21-bit hash: 28,273,448 bytes, 107 sec (160 MB mem use)
    22-bit hash: 28,265,901 bytes, 107 sec (176 MB mem use)
    23-bit hash: 28,260,579 bytes, 108 sec (208 MB mem use)
    24-bit hash: 28,259,984 bytes, 110 sec (272 MB mem use)

    I expect that on other machines with smaller and/or slower memory, the benefit of a smaller hash can be greater.

    A smaller hash can be used not for speed improvement but as for lower memory requirements. Like I said, with 22-bit hash LZPM will use 176 MB instead of 272, with just a tiny compression loss, preferable on large files.

    For comparison, LZPM with 22-bit hash compresses ENWIK9 to 245,266,715 bytes.

    Just one question: Worth it?

    Maybe not...


  22. #22
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,511
    Thanks
    746
    Thanked 668 Times in 361 Posts
    Quote Originally Posted by encode
    Looks like LZMA uses 20-bits for hash4 values.
    for N mb dictionary, it uses exactly 4N mb for prev and 4N mb for head

    Quote Originally Posted by encode
    IMHO, the main catch in RAM. 266.7 MHz vs. 800 MHz.
    his memory really 533 MHz, DDR2 speeds starts from 400 MHz (which is just 100MHz of real speed multiplied by 4x acceleration)

  23. #23
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,985
    Thanks
    377
    Thanked 353 Times in 141 Posts
    Quote Originally Posted by Bulat Ziganshin
    for N mb dictionary, it uses exactly 4N mb for prev and 4N mb for head
    Note this not means that LZMA uses hash size of N.

    LZMA uses 2-3-4 bytes hashing.

    hash[] - is the "head" in terms of deflate
    son[] - is the "prev"

    it stores offsets in hash[] in followed manner:
    p->hash[hash2Value] =
    p->hash[kFix3HashSize + hash3Value] =
    p->hash[kFix4HashSize + hashValue] = p->pos;

    in other words the hash[] keeps offsets for 2, 3 and 4 byte hashes.

    /* LzHash.h */

    #define kHash2Size (1 << 10)
    #define kHash3Size (1 << 16)
    #define kHash4Size (1 << 20)

    #define kFix3HashSize (kHash2Size)
    #define kFix4HashSize (kHash2Size + kHash3Size)
    #define kFix5HashSize (kHash2Size + kHash3Size + kHash4Size)


  24. #24
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,511
    Thanks
    746
    Thanked 668 Times in 361 Posts
    look at this accurately: lzma uses 5+4+3+2 hashing now and size of 5-byte hash isn't defined here. i've written about 4.43 version, which used 4-3-2 model

  25. #25
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,985
    Thanks
    377
    Thanked 353 Times in 141 Posts
    I just briefly looked at LZMA SDK 4.49...

  26. #26
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Quote Originally Posted by Bulat Ziganshin
    his memory really 533 MHz, DDR2 speeds starts from 400 MHz (which is just 100MHz of real speed multiplied by 4x acceleration)
    Whos memory is really only 533 MHz?

  27. #27
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,511
    Thanks
    746
    Thanked 668 Times in 361 Posts
    Quote Originally Posted by LovePimple
    Whos memory is really only 533 MHz?
    we said about nimdamsks one

  28. #28
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Quote Originally Posted by Bulat Ziganshin
    we said about nimdamsks one
    Thanks Bulat!

  29. #29
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,985
    Thanks
    377
    Thanked 353 Times in 141 Posts
    Played with some parsing ideas and further improved LZPM!

    Some new results for LZPM 0.09:

    ENWIK8: 27,986,111 bytes
    ENWIK9: 242,929,442 bytes

    world95.txt: 579,933 bytes

    3200.txt: 4,898,392 bytes

    book1: 267,448 bytes

    bible.txt: 913,315 bytes

    In addition, I made a few code optimizations.


Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •