Link:
lzpm004.zip (26 KB)
Enjoy!
![]()
Link:
lzpm004.zip (26 KB)
Enjoy!
![]()
Thanks!![]()
Awesome!!!Thanks Ilia!
![]()
Quick test...
LZPM 0.03:
A10.jpg > 832,328
AcroRd32.exe > 1,634,410
english.dic > 1,015,867
FlashMX.pdf > 3,768,999
FP.LOG > 800,097
MSO97.DLL > 2,012,471
ohs.doc > 836,869
rafale.bmp > 1,103,015
vcfiu.hlp > 725,726
world95.txt > 617,266
Total = 13,347,048 bytes
ENWIK8 > 29,248,641 bytes
LZPM 0.04:
A10.jpg > 832,328
AcroRd32.exe > 1,634,410
english.dic > 1,015,867
FlashMX.pdf > 3,768,999
FP.LOG > 799,963 ***
MSO97.DLL > 2,012,471
ohs.doc > 836,869
rafale.bmp > 1,103,015
vcfiu.hlp > 725,726
world95.txt > 617,266
Total = 13,346,914 bytes
ENWIK8 > 29,297,905 bytes
Thank you!![]()
Another quick test...
Test Machine = AMD Sempron 2400+
Test File = "lzpmtest.tar" (574,381,056 bytes) contains exe, dll, txt, html, ini, bmp, wav, etc.
Results:
LZPM 0.03:
Ratio = 42.2%
Compression Time = 00:10:40.625
Compressed Size = 242,253,866 bytes
LZPM 0.04:
Ratio = 42.19%
Compression Time = 00:09:32.985
Compressed Size = 242,320,379 bytes
0.04 is a little larger and slower than 0.02 on enwik8/9.
http://cs.fit.edu/~mmahoney/compression/text.html# 2545
The goal of 0.04 is introduce the brand new circular hash chains. If at all, match finder based on hash chains is slower on text files whilst much faster on other files such as binary and compressed data. In addition, in some cases hash chains are able to provide better match search (world95.txt, etc.). Note that LZPM is not intended to provide the best performance on ENWIK8/9.
From a next version (0.05) I'll add an improved lazy matching (as proposed by Malcolm Taylor) which gives better compression, especially better text compression, at the cost of some compression time. Anyway, it's worth it. I'll post some digits later.
![]()
OK, that explains it. Also, I just noticed that 0.03 uses more memory, so it has the top spot now
http://cs.fit.edu/~mmahoney/compression/text.html# 2544
Thanks for testing!
LZPM gets much more benefits from optimized parsing copared to the QUAD. However, the speed is also really affected - simply compare the QUAD with default mode and LZPM 0.04 - its both uses the same parsing scheme. So, I guess, adding Flexible Parsing to the LZPM is unpractical. Anyway, we can use something middle between lazy matching and flexible parsing - an improved lazy matching.
Check out how parsing affects the compression:
world95.txt (2,988,578 bytes):
greedy: 658,452 bytes
lazy1: 617,266 bytes (current)
lazy2: 603,251 bytes
calgary.tar (3,152,896 bytes):
greedy: 929,462 bytes
lazy1: 899,738 bytes (current)
lazy2: 891,079 bytes
A nice gain, isn't it?
And with new improved parsing (lazy2) LZPM compresses:
ENWIK8 to 28,896,680 bytes
ENWIK9 to 251,111,835 bytes
With lazy2 scheme speed is not really affected. However, in some cases compression can be slightly worser. Currently, I'm thinking on implementation details (optimizations, etc.) and about the theory. For example, why in some cases new scheme provides less compression, why on some files (canterbury corpus) lazy matching provides better results compared to the Flexible Parsing, etc.
![]()
im interested in using this scheme in tornado. can you explain details?
both schemes doesnt make strong guarantees. they are just heuristics, after allOriginally Posted by encode
All simple. With QUAD/LZPM 0.04 I use lazy matching with one byte lookahead. With the new LZPM 0.05 Im experimenting with two bytes lookahead. No more no less. As Malcolm said, he often uses lazy matching with two bytes lookahead.Originally Posted by Bulat Ziganshin
![]()
i don't understand what you mean here. can you show this as pseudocode?
Hm, okay.
curmatch = getmatch(pos); // get current match
if (getmatch(pos + 1) > curmatch) // one byte lookahead
curmatch = 0; // discard current match
// ...
if ((getmatch(pos + 1) > curmatch) || (getmatch(pos + 2) > curmatch)) // two bytes lookahead
curmatch = 0; // discard current match
![]()
Originally Posted by Malcolm Taylor (comp.compression)
![]()
thanks, now it's clear. it needs more time than usual lazy matching, but may be useful for highest modes
Of course...Originally Posted by Bulat Ziganshin
![]()
but why you said "With lazy2 scheme speed is not really affected."?
Because, if at all, in my implementation speed indeed is not so much affected. Again, the impact depends on data, but overall, compression becomes just slightly slower. In other words compression gain is worth it.Originally Posted by Bulat Ziganshin
![]()
Okay, some testing results. At this time I gonna check the difference between flexible parsing and a new lazy matching - to see how many air new approach keeps inside compressed files.
fp.log:
fp: 682,006 bytes
lazy2: 797,806 bytes
world95.txt:
fp: 590,190 bytes
lazy2: 603,251 bytes
acrord32.exe:
fp: 1,625,916 bytes
lazy2: 1,630,322 bytes
calgary.tar:
fp: 888,055 bytes
lazy2: 891,079 bytes
All in all, new scheme works pretty well. It must me said that here I used the simplified version of Flexible Parsing, so, a little bit higher compression is still possible. Anyway, as you can see new lazy matching keeps not too much air being MUCH faster compared to the FP.
![]()
Another results with FP:
ENWIK8: 28,567,778 bytes
ENWIK9: 247,950,599 bytes
However, check out results with canterbury.tar:
fp: 545,035 bytes
lazy2: 540,426 bytes
Here, FP is fired.
Another thing, I have idea to switch LZPM's name to LZCOMP (lzcomp 0.05). As easy-to-read name.
![]()
Results are looking promising!
I prefer the name "LZPM".Originally Posted by encode
![]()
Maybe LZPM is indeed better. LZPM stands for LZ77-PM.Originally Posted by LovePimple
Note that lzpm 0.05 will NOT use FP. Im just experimenting/testing...Originally Posted by LovePimple
![]()
I understand!Originally Posted by encode
![]()
By the way, the release is soon! This weekend I'm expecting the MC update. Even if this not happen I release a new version next week!
![]()
lzpm 0.05 at sfc
A10.jpg: 832,282 bytes
acrord32.exe: 1,630,322 bytes (e8e9: 1,474,521 bytes)
english.dic: 1,031,885 bytes
FlashMX.pdf: 3,766,309 bytes
fp.log: 797,806 bytes
mso97.dll: 2,010,427 bytes (e8e9: 1,906,772 bytes)
ohs.doc: 836,570 bytes
rafale.bmp: 1,094,750 bytes
vcfiu.hlp: 715,795 bytes
world95.txt: 603,251 bytes
Total: 13,319,397 bytes![]()
LZPM 0.04 has been tested at MC! Very cool!
I like its performance, look at the MFC results:
67.34%, comp: 118 sec, dec: 27 sec.
In other words, it outperforms QUAD, PIM and, to be honest LZPM is my favorite now!
Next version will have a bit higher compression, keeping same decompression speed (and compatibility, by the way, all versions of LZPM since 0.02 are fully compatible!) and next version must be pretty cool! I think I can make LZPM a bit slower at the cost of some higher (especially text) compression.
Keep in mind that LZPM has no filters. Even E8/E9 not included. QUAD and especially PIM have preprocessing...
Anyway, the main catch is fast decompression - the fastest from my all software. Also, I think I have no idea for decompressor improvement, that's why I keep compatibility so long time.
![]()
Will this higher compression will be available with the release of version 0.05?Originally Posted by encode
Yes, like I posted before LZPM 0.05 will have "lazy2" parsing.Originally Posted by LovePimple
But thats not all that Im planning with LZPM. Probably, I will add the binary tree match finder or an improved hash chains, although current match finder is pretty good.
Anyway, the main target is a parsing scheme with cost function. I will try to implement FP with pricing, or even true optimal parsing with/or without pricing. There is also a variant like Uwe did with his brilliant ALZ (AFAIK) – some sort of lazy matching with price function, for better choice making based on a real literal/match cost.
To be continued...
![]()
The next release should be quite a powerhouse!![]()