Page 1 of 2 12 LastLast
Results 1 to 30 of 33

Thread: lzpm 0.03 is here!

  1. #1
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,985
    Thanks
    377
    Thanked 353 Times in 141 Posts
    Okay! Let me introduce the brand new lzpm featuring hash chaining!

    This version has:
    + Slightly higher compression
    + Faster compression (in some cases the speed improvement is just crazy)
    + Increased memory usage for compression (176 MB). Note that with the decompression memory usage stays the same (20 MB).

    Enjoy!

    Link:
    lzpm003.zip (26 KB)


  2. #2
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    This is AWESOME!!!

    Thanks Ilia!

  3. #3
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Quick test...

    Test Machine = AMD Sempron 2400+


    MC SFC:

    A10.jpg > 832,328
    AcroRd32.exe > 1,634,410
    english.dic > 1,015,867
    FlashMX.pdf > 3,768,999
    FP.LOG > 800,097
    MSO97.DLL > 2,012,471
    ohs.doc > 836,869
    rafale.bmp > 1,103,015
    vcfiu.hlp > 725,726
    world95.txt > 617,266

    Total = 13,347,048 bytes


    ENWIK8:

    Compressed Size = 29,248,641 bytes

    Compression Time = 00:04:55.782

    Decompression Time = 00:00:15.250

  4. #4
    Member
    Join Date
    Dec 2006
    Posts
    611
    Thanks
    0
    Thanked 1 Time in 1 Post
    Thanks encode! It's compressing more than twice as fast as 0.02

  5. #5
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,985
    Thanks
    377
    Thanked 353 Times in 141 Posts
    One thing - I'm not happy with memory usage. Somehow Malcolm write program which operates at 2*N. My current version must use 17*N. However, I've made a few restrictions to reduce memory usage.

    Hash chains show better performance than older scheme. It provides better text compression and much faster compression, especially with not so well compressible files. However, looks like with text files speed is slightly slower.

    I should make some additional experiments with hash chains:
    1. Actually, the prev[] can contain less elements than N (DICTSIZE).
    So, the memory usage can be less than 4*N in case with LZ77.
    2. Hash size is also variable. However, looks like the optimal variant is prev[] and hash[] has the same number of elements which equal to DICTSIZE (As Deflate make use)
    3. I can play with some heuristics for truncate hash chains. Currently I use some nice heuristic, however it can be further improved.


  6. #6
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,511
    Thanks
    746
    Thanked 668 Times in 361 Posts
    i think that with closed sources and indefinable words ("some nice heuristic") you are talking to yourself

  7. #7
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,985
    Thanks
    377
    Thanked 353 Times in 141 Posts
    Yepp!

  8. #8
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    I find it strange that on my machine QUAD v1.12 always compresses ENWIK8 much faster than any LZPM version. Does anyone else get the same result?

    lzpm v0.03
    Compression Time = 00:04:55.782

    quad v1.12
    Compression Time = 00:01:05.972

    quad v1.12 (-x)
    Compression Time = 00:01:44.406

  9. #9
    Member
    Join Date
    Dec 2006
    Posts
    611
    Thanks
    0
    Thanked 1 Time in 1 Post
    Quote Originally Posted by LovePimple
    I find it strange that on my machine QUAD v1.12 always compresses ENWIK8 much faster than any LZPM version. Does anyone else get the same result?
    I think its supposed to be so, LZPM is made to be fast regarding decompression

    compressor - compression time - decompression time (process times)
    quad v1.12 (-x) - 00:00:54.390 - 00:00:13.937
    ___quad v 1.12 - 00:00:34.031 - 00:00:14.156
    ____lzpm v0.03 - 00:01:18.312 - 00:00:06.656

  10. #10
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,985
    Thanks
    377
    Thanked 353 Times in 141 Posts
    Quote Originally Posted by Black_Fox
    I think its supposed to be so, LZPM is made to be fast regarding decompression
    Exactly!

    Like I said, LZPM must be potentially slower than QUAD at compression, but much faster at decompression. Anyway, Im still working on match searching stage. Currently Ive aready explored a cutdown version of hash chains which requires 2*N with just slight compression loss. My target is 2*N...4*N with no compression loss - it is possibe, since current scheme unefficiently grabs too many RAM. Furthermore, theoreticaly, this implementation of ROLZ can run with just 1.5*N. By the way, I tested the super fast modification of LZPM - which is fully compatible with current one. Instead of hash chains (head + prev) we can use just one head - i.e. a hash table with single hash bins...

    Anyway, why LZPM 0.03 was not tested at maximumcompression.com??

  11. #11
    Member
    Join Date
    Dec 2006
    Posts
    611
    Thanks
    0
    Thanked 1 Time in 1 Post
    Quote Originally Posted by encode
    By the way, I tested the super fast modification of LZPM - which is fully compatible with current one.
    Will it just increase the (de)compression speed or also decrease the ram reqs?
    Quote Originally Posted by encode
    Anyway, why LZPM 0.03 was not tested at maximumcompression.com??
    Maybe Werner didnt notice it was released...

    EDIT: I see, he will now

  12. #12
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Thanks! Im happy now that I know its the same on all machines.

    Quote Originally Posted by encode
    Anyway, why LZPM 0.03 was not tested at maximumcompression.com??
    I dont know. I was very dissapointed when I found that LZPM had not been tested.

  13. #13
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,511
    Thanks
    746
    Thanked 668 Times in 361 Posts
    Quote Originally Posted by encode
    Instead of hash chains (head + prev) we can use just one head
    using 2 entries instead of 1 increase compression by 10%. then we can use 3,4,8... entries anyway matrix hash should be faster than chains - it decreases number of memory accesses ~2x times

  14. #14
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,985
    Thanks
    377
    Thanked 353 Times in 141 Posts
    Simply compare LZPM 0.02 and LZPM 0.03...

    I think that hash chains are able to provide a variable length set of offsets. If we need a large number of entries (text files) hash chains will be more efficient. If we need a fewest set (compressed data) again hash chains will be more efficient.

  15. #15
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,511
    Thanks
    746
    Thanked 668 Times in 361 Posts
    this says only about your implementation

    are you tried to make circular hash line? except for expenses to move the whole hash line on each update and testing of strings beyond actual end of hash line, i don't see any reasons for hash chain to be faster. are you?

  16. #16
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,511
    Thanks
    746
    Thanked 668 Times in 361 Posts
    Quote Originally Posted by encode
    I think that hash chains are able to provide a variable length set of offsets
    typically, all compressors (zip,rar,7zip) limits hash chain length to rather small values, 16-256. once you use tabular hash with the same hash row width, i cant imagine reasons why HC may be faster

  17. #17
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,985
    Thanks
    377
    Thanked 353 Times in 141 Posts
    Quote Originally Posted by Bulat Ziganshin
    are you tried to make circular hash line?
    Yes! Hash chains again faster.

    Quote Originally Posted by Bulat Ziganshin
    i cant imagine reasons why HC may be faster
    I think the key in my implementation. With hash chains I use a larger hash and do some trick to reduce chain length - I stop match search if founded chain is too distant.

    Using a larger hash and fewer hash bins (to restrict memory use) will hurt compression.

    Quote Originally Posted by encode
    Layout:
    <hash size, bits>-<number of hash bins>: <compressed size> <time>
    ...
    world95.txt (2,988,578 bytes)
    18-32: 626,067 bytes, 0.969 sec
    17-64: 621,738 bytes, 1.250 sec
    16-128: 619,408 bytes, 1.671 sec
    Anyway, currently Im working on further improvement...

  18. #18
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,511
    Thanks
    746
    Thanked 668 Times in 361 Posts
    Quote Originally Posted by encode
    With hash chains I use a larger hash and do some trick to reduce chain length - I stop match search if founded chain is too distant
    you can do the same with tabular hash

    Quote Originally Posted by encode
    Using a larger hash and fewer hash bins (to restrict memory use) will hurt compression.
    you compare small tabular hash with large chained hash. try to compare equal-sized ones

  19. #19
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,985
    Thanks
    377
    Thanked 353 Times in 141 Posts


    I think it's not casually that LZMA make use of Hash Chain (Along with other Match Finders).

  20. #20
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,511
    Thanks
    746
    Thanked 668 Times in 361 Posts
    of course not. i already wrote to you that it was the best solution in old times and nowadays no one except Malcolm found better one. with tornado, i'm pretty happy with performance of tabular hash

  21. #21
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,985
    Thanks
    377
    Thanked 353 Times in 141 Posts
    I hope someday I'll write the same...

    And now we can disassemble mcomp, read and analyze all posts by Malcolm etc. At least my LZPM is faster at decompression compared to the ROLZ2.

    Some time ago, I requested to purchase M Software General Compression Library which contains ROLZ sources. Malcolm said that he cannot provide me with this library, since he trying to make money with his software, and I can steal all ideas and release an Open Source project...

  22. #22
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,985
    Thanks
    377
    Thanked 353 Times in 141 Posts
    mcomp.exe

    Compiler: Microsoft Visual C++ 7.0 [Debug]

    Looks like this' debug release (assertions enabled). From assertions we can restore some source file names (plus variable names):

    &#92;src&#92;rkPWCMCodec.cpp
    &#92;src&#92;rkFPWCompressor.cpp
    &#92;src&#92;rkFPWDecompressor.cpp
    &#92;src&#92;rkFPWCodec.cpp
    &#92;src&#92;FastCompressor.cpp
    &#92;src&#92;FastDecompressor.cpp
    &#92;src&#92;ROLZMatchFinder.cpp
    &#92;src&#92;FastMatchFinder.cpp
    &#92;src&#92;rkZipDeflate.cpp
    &#92;src&#92;rkZipInflate.cpp
    &#92;src&#92;rkData.cpp
    &#92;src&#92;ArithDecoder.cpp



  23. #23
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,511
    Thanks
    746
    Thanked 668 Times in 361 Posts
    just for comparison:

    hash16x: 57.0 sec
    hash32x: 84.7 sec
    hash64x: 134.2 sec

    hash16x: 116.6 sec
    hash32x: 194.3 sec
    hash64x: 346.4 sec

    -mc16: 178.1 sec
    -mc32: 245.0 sec
    -mc64: 391.5 sec

    first group is compression times using tabular hash and reduced updates (when match found, only first and last string of this match is inserted into hash), second group - with full updates, third - 7zip (i think that it uses full updates, although don't checked this assumption)

    as you can see, full updates for 16-way hash are 1.5x faster with tabular hash but speed becomes pretty close for 64-way hash. i hope that with circular hashing my 64-way hash will become much faster but don't yet tested this assumption

    1.5x speed difference of 16-way hash may be counted as results of using "cached hash" (it's like changes in quad 1.12) which improved speed 1.5 times, afair. so this table proves that for 16-way hash speed should be the same (if both hashes will be caching or both non-caching) and for 64-way tabular hash should be definitely slower (at least if 7zip really does full updates). may be circular hash organization tabular hash may outperform hash chains. at least i don't see any reasons why it may be slower..

  24. #24
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,985
    Thanks
    377
    Thanked 353 Times in 141 Posts


    Better measure the timings for identical algorithm implementations except for match searching - hash chains / tabular hash. Second aspect is search quality. In my experiments chains do better search.

    Anyway, I chose hash chains.

    Topic closed.

    Currently, I'm experimenting with hashed linked lists (hash chains is just variation of this). With this approach we can restrict the number of entries say to 1M.

  25. #25
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,511
    Thanks
    746
    Thanked 668 Times in 361 Posts
    my study shows that 7zip in fast mode uses full updates and lazy matching. this is critery of abandoning current match because next one will be better:

    if (_longestMatchLength >= lenMain && newDistance < backMain ||
    _longestMatchLength == lenMain + 1 && !ChangePair(backMain, newDistance) ||
    _longestMatchLength > lenMain + 1 ||
    _longestMatchLength + 1 >= lenMain && lenMain >= 3 && ChangePair(newDistance, backMain))

    here lenMain/backMain are len/dist for match at P, longestMatchLength/newDistance is match for P+1. also

    static inline bool ChangePair(UInt32 smallDist, UInt32 bigDist)
    {
    return ((bigDist >> 7) > smallDist);
    }

  26. #26
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,985
    Thanks
    377
    Thanked 353 Times in 141 Posts
    Hmm, good idea about heuristic with current and next distances. I should try it.

    Thanks Bulat!

    Some info:
    Using hash chains with prev[DICTSIZE] we able to hold:
    head[HASHSIZE] + prev[DICTSIZE] nodes.

    Such large number of nodes will be never added.

    Even with incompressible input (at each step we add a new node) we able to add the DICTSIZE nodes at max. In practice, we add just 2000000...7000000 nodes for each 16 MB block. In other words, we can see a big memory wastage, and we can use much less memory.

    First idea, is to add a new "prev" variable to each node. So, "pos" will not represent the previous node as with hash chaining.

    In this case, we able to add new nodes to prev[] сonsistently, step by step increasing the number of available nodes in prev[]. The "prev" variable in each node will link to the preious node.

    If number of max mode is reached we can:
    + Reset prev[] and head[]
    + Rebuild prev[] and head[]
    + Treat prev[] as circular table

    Continue digging...

    (Any ideas are welcomed )

  27. #27
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,511
    Thanks
    746
    Thanked 668 Times in 361 Posts
    in my arjz this heuristic was rather simple:

    if newlen==oldlen+1 && newdist/64>olddist
    then abandon new match

    7zs one seems much more advanced

    one more idea you can reuse is cabarc style of huffman tables: instead of making separate length and distance tables, we join them. say, zip has ~32 slots for lens and ~32 slots for distances. instead, we can make 1024 combined slots. this allows to take into account dependencies between length and distance. in my quick test difference was 1% (well, i tested only 1 file)

    you may borrow this code from tornado



    Quote Originally Posted by encode
    First idea, is to add a new "prev" variable to each node
    yes, idea of hash chain/bintree in rar, zip, 7zip, cabarc is that you can omit this array and use direct indexing instead. the drawback is that we cant continue hash chains outside of sizeof(prev). in your variant, memreqs will be 2x more, but this solution is more flexible. problem is reusing space. its hard to use prev as circular table because you need to clear newer references

    imagine that you have prev[1000] referencing to prev[2000]. when you recycle prev[2000], how you will clear reference in prev[1000]?

    in zip-style scheme at each buffer shift we scanned whole prev. if for example we shift at 8k, all prev[] values less than 8k was cleared. you may do the same

  28. #28
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,985
    Thanks
    377
    Thanked 353 Times in 141 Posts
    Quote Originally Posted by Bulat Ziganshin
    imagine that you have prev[1000] referencing to prev[2000]. when you recycle prev[2000], how you will clear reference in prev[1000]?
    With circular table each new node will overwrite the older one. Anyway, due to my tests at some point encoder can "freeze" - cyclically traverse over some route of nodes. So, we must fight against such side effect. For example we can look at "pos", if "pos" of prev node is bigger than "pos" of current node we break the loop.

    Quote Originally Posted by Bulat Ziganshin
    in zip-style scheme at each buffer shift we scanned whole prev. if for example we shift at 8k, all prev[] values less than 8k was cleared. you may do the same
    I called this rebuilding.

    Will make some experiments...

  29. #29
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,511
    Thanks
    746
    Thanked 668 Times in 361 Posts
    Quote Originally Posted by encode
    Better measure the timings for identical algorithm implementations except for match searching - hash chains / tabular hash
    afaik at this moment, there are no more differences between tor and 7zip - both uses lazy matching, full updates. other parts of program dont use too much time in my program, i think that the same true for 7z although im not 100% sure

  30. #30
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,985
    Thanks
    377
    Thanked 353 Times in 141 Posts
    Some assumption:

    LZ77 can decode match from any position restricted by order-0
    ROLZ can decode match from any position restricted by order-X


Page 1 of 2 12 LastLast

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •