Okay, new version is here:
http://www.encode.su/lzpm/lzpm.htm
Enjoy!
![]()
Okay, new version is here:
http://www.encode.su/lzpm/lzpm.htm
Enjoy!
![]()
Thanks!
Decompression is very fast![]()
Thanks Ilia!![]()
great!will be tested this weekend
![]()
Quick test..
QUAD v1.12
AcroRD32.exe
Compressed Size: 1,503,119 bytes
Compression Time: 4.460 Seconds
Decompression Time: 2.973 Seconds
rafale.bmp
Compressed Size: 1,036,312 bytes
Compression Time: 2.742 Seconds
Decompression Time: 1.297 Seconds
world95.txt
Compressed Size: 625,831 bytes
Compression Time: 1.741 Seconds
Decompression Time: 0.739 Seconds
LZPM v0.07
AcroRD32.exe
Compressed Size: 1,623,309 bytes
Compression Time: 7.864 Seconds
Decompression Time: 1.070 Seconds
rafale.bmp
Compressed Size: 1,070,492 bytes
Compression Time: 38.093 Seconds
Decompression Time: 0.700 Seconds
world95.txt
Compressed Size: 585,659 bytes
Compression Time: 31.226 Seconds
Decompression Time: 0.418 Seconds
QUAD compresses much faster than LZPM on my machine.
LZPM potentially must be slower than QUAD, since it's more LZ than anything else. For example, built-in PPM in QUAD plays a role - sometimes showing nice compression, sometimes reduced decompression speed.
Some properties of current Hash Chains (data type - avg speed):
Incompressible data - fastest
Binary data - fast
Text data - slowest
On text files, match finder based on hash chains generates a long hash chains - LZPM checks them all. Apparently, with a binary data hash chains are shorter and with compressed/random data chains are shortest.
![]()
The compression is very good but slow! excellent naturally in decompression!
The beauty of any LZ-based compressor is that user or programmer can choose a different parsing strategies in favor to control speed over compression.
For example, with LZPM I already tried:
1. Greedy parsing - fastest
2. Lazy matching with one byte lookahead - fast
3. Lazy matching with two bytes lookahead - fast/normal
4. Flexible parsing - currently slowest, but providing the best results with ROLZ scheme, at least to my current knowledge.
I think that current LZPM is not too slow. I simply compared it to the ROLZ2 from mcomp.exe and LZMA from LZMA SDK. The speed is acceptable compared also to many other modern LZ-based compressors like CABARC. Looking at LZPM 0.06 results at MFC I thought that I can add something to improve compression at the cost of speed. Probably, lazy matching more efficient in terms of compression speed vs. ratio, but Flexible Parsing moved LZPM to a new stage, especially if it deals with text files.
![]()
0.07 moves up 3 spots on enwik9 but is 2.5 times slower for compression and uses 3.5x more memory for compression. Decompression still uses only 20 MB and is just as fast.
http://cs.fit.edu/~mmahoney/compression/text.html# 2464
Thank you for testing!
It also nice to see there is no compressor that compresses better with faster decompression.![]()
Compression timings for LZPM 0.07 on my machine:Originally Posted by LovePimple
acrord32.exe:
Kernel Time = 0.093 = 00:00:00.093 = 5%
User Time = 1.531 = 00:00:01.531 = 93%
Process Time = 1.625 = 00:00:01.625 = 99%
Global Time = 1.641 = 00:00:01.641 = 100%
rafale.bmp:
Kernel Time = 0.187 = 00:00:00.187 = 2%
User Time = 7.343 = 00:00:07.343 = 97%
Process Time = 7.531 = 00:00:07.531 = 100%
Global Time = 7.516 = 00:00:07.516 = 100%
world95.txt:
Kernel Time = 0.062 = 00:00:00.062 = 2%
User Time = 2.796 = 00:00:02.796 = 97%
Process Time = 2.859 = 00:00:02.859 = 99%
Global Time = 2.860 = 00:00:02.860 = 100%
![]()
It's funny, but I again improved compression! Okay, some testing resilts for LZPM 0.08:
world95.txt: 584,426 bytes
fp.log: 643,043 bytes
ENWIK8: 28,259,984 bytes
ENWIK9: 245,221,254 bytes
![]()
Little test on Pentium D 820:
nero.exe 36,003,840
Nero Burning ROM 7, 5, 7, 0
quad
7,572,146
quad -x
7,407,461
lzpm
7,448,386
cabarc lzx:21
7,578,441
quad compression
Process Time = 6.640 = 00:00:06.640 = 100%
Global Time = 6.625 = 00:00:06.625 = 100%
quad decompression
Process Time = 3.656 = 00:00:03.656 = 84%
Global Time = 4.328 = 00:00:04.328 = 100%
quad -x compression
Process Time = 15.375 = 00:00:15.375 = 99%
Global Time = 15.391 = 00:00:15.391 = 100%
quad -x decompression
Process Time = 3.671 = 00:00:03.671 = 86%
Global Time = 4.250 = 00:00:04.250 = 100%
lzpm compression
Process Time = 8.937 = 00:00:08.937 = 100%
Global Time = 8.922 = 00:00:08.922 = 100%
lzpm decompression
Process Time = 1.953 = 00:00:01.953 = 77%
Global Time = 2.531 = 00:00:02.531 = 100%
cabarc lzx:21 compression
Process Time = 33.296 = 00:00:33.296 = 99%
Global Time = 33.359 = 00:00:33.359 = 100%
cabarc decompression
Process Time = 0.546 = 00:00:00.546 = 63%
Global Time = 0.859 = 00:00:00.859 = 100%
Note that CABARC has E8 transformer, QUAD has E8E9 transformer. LZPM uses a bare ROLZ algorithm. This fact for sure changes the real picture in this test.
![]()
Also just briefly compared LZPM 0.07 and 0.08.
nero.exe (nero 6) (13,983,802 bytes)
LZPM 0.07: 3,471,743 bytes
LZPM 0.08: 3,464,135 bytes
![]()
Impressive result from lzpm!Originally Posted by nimdamsk
Are your filters for PIM effectively compatible with LZPM too?
Some more tests
First times - compression, second - decompression.
lzpm01
7,579,372
Process Time = 17.609 = 00:00:17.609 = 100%
Global Time = 17.593 = 00:00:17.593 = 100%
Process Time = 2.578 = 00:00:02.578 = 81%
Global Time = 3.157 = 00:00:03.157 = 100%
lzpm02
7,447,288
Process Time = 17.250 = 00:00:17.250 = 99%
Global Time = 17.266 = 00:00:17.266 = 100%
Process Time = 2.156 = 00:00:02.156 = 77%
Global Time = 2.797 = 00:00:02.797 = 100%
lzpm03
7,449,582
Process Time = 7.437 = 00:00:07.437 = 99%
Global Time = 7.453 = 00:00:07.453 = 100%
Process Time = 2.109 = 00:00:02.109 = 77%
Global Time = 2.719 = 00:00:02.719 = 100%
lzpm04
7,463,407
Process Time = 7.640 = 00:00:07.640 = 99%
Global Time = 7.656 = 00:00:07.656 = 100%
Process Time = 2.078 = 00:00:02.078 = 76%
Global Time = 2.704 = 00:00:02.704 = 100%
lzpm05
7,444,864
Process Time = 9.046 = 00:00:09.046 = 100%
Global Time = 9.031 = 00:00:09.031 = 100%
Process Time = 1.984 = 00:00:01.984 = 76%
Global Time = 2.609 = 00:00:02.609 = 100%
lzpm06
7,448,386
Process Time = 8.703 = 00:00:08.703 = 100%
Global Time = 8.687 = 00:00:08.687 = 100%
Process Time = 2.062 = 00:00:02.062 = 77%
Global Time = 2.672 = 00:00:02.672 = 100%
Yes of course. At least I can add the EXE-filter, because many users already tried to test the LZPM exactly on their EXE files.Originally Posted by Black_Fox
Also I tested LZPM with the delta/multimedia filters. Unfortunately, in this case PPM-based algorithms do much better. Anyway, compression with such filters is better. One problem is MM detection. PIM archiver has a special large module called "detector". It determines file types, reads their headers, and properly chooses/configures MM filters. I just wont add such thing to the LZPM, because the size of source of this module is larger than current LZPMs source!
To taste such thing like E8/E9 transformer consider followed digits:
acrord32.exe:
LZPM 0.08: 1,619,211 bytes
LZPM 0.08+EXEFLT: 1,464,231 bytes
mso97.dll:
LZPM 0.08: 1,998,513 bytes
LZPM 0.08+EXEFLT: 1,895,526 bytes
Photoshop.exe:
LZPM 0.08: 7,332,764 bytes
LZPM 0.08+EXEFLT: 6,286,536 bytes
Doom3.exe:
LZPM 0.08: 1,860,838 bytes
LZPM 0.08+EXEFLT: 1,735,073 bytes
MPTRACK.EXE:
LZPM 0.08: 529,540 bytes
LZPM 0.08+EXEFLT: 507,765 bytes
Reaktor.exe:
LZPM 0.08: 2,218,921 bytes
LZPM 0.08+EXEFLT: 2,082,418 bytes
nero.exe:
LZPM 0.08: 3,464,135 bytes
LZPM 0.08+EXEFLT: 3,193,284 bytes
![]()
Can't stop
Some 3 Mpx photo from nature
IMG_0862.bmp
9,437,238
lzpm06.exe
8,080,899
Process Time = 3.671 = 00:00:03.671 = 99%
Global Time = 3.672 = 00:00:03.672 = 100%
Process Time = 2.203 = 00:00:02.203 = 93%
Global Time = 2.359 = 00:00:02.359 = 100%
lzpm07.exe
8,080,252
Process Time = 3.546 = 00:00:03.546 = 99%
Global Time = 3.547 = 00:00:03.547 = 100%
Process Time = 1.953 = 00:00:01.953 = 92%
Global Time = 2.109 = 00:00:02.109 = 100%
cabarc lzx:21
7,781,731
Process Time = 13.750 = 00:00:13.750 = 99%
Global Time = 13.765 = 00:00:13.765 = 100%
Process Time = 0.234 = 00:00:00.234 = 107%
Global Time = 0.219 = 00:00:00.219 = 100%
FFMPEG-devel maillist archive
2007-March.txt
4,898,478
lzpm06.exe
709,683
Process Time = 1.515 = 00:00:01.515 = 101%
Global Time = 1.500 = 00:00:01.500 = 100%
Process Time = 0.250 = 00:00:00.250 = 84%
Global Time = 0.297 = 00:00:00.297 = 100%
lzpm07.exe
686,082
Process Time = 5.531 = 00:00:05.531 = 100%
Global Time = 5.516 = 00:00:05.516 = 100%
Process Time = 0.218 = 00:00:00.218 = 77%
Global Time = 0.281 = 00:00:00.281 = 100%
cabarc lzx:21
706,263
Process Time = 5.843 = 00:00:05.843 = 100%
Global Time = 5.829 = 00:00:05.829 = 100%
Process Time = 0.062 = 00:00:00.062 = 132%
Global Time = 0.047 = 00:00:00.047 = 100%
Registry hive file
software
26,574,848
lzpm06.exe
4,821,118
Process Time = 15.828 = 00:00:15.828 = 100%
Global Time = 15.828 = 00:00:15.828 = 100%
Process Time = 1.484 = 00:00:01.484 = 77%
Global Time = 1.906 = 00:00:01.906 = 100%
lzpm07.exe
4,744,439
Process Time = 63.734 = 00:01:03.734 = 99%
Global Time = 63.782 = 00:01:03.782 = 100%
Process Time = 1.234 = 00:00:01.234 = 71%
Global Time = 1.734 = 00:00:01.734 = 100%
cabarc lzx:21
4,460,355
Process Time = 34.796 = 00:00:34.796 = 99%
Global Time = 34.859 = 00:00:34.859 = 100%
Process Time = 0.406 = 00:00:00.406 = 43%
Global Time = 0.938 = 00:00:00.938 = 100%
7-Zip sources
7z444.tar
5,368,832
lzpm06.exe
611,012
Process Time = 5.343 = 00:00:05.343 = 99%
Global Time = 5.360 = 00:00:05.360 = 100%
Process Time = 0.218 = 00:00:00.218 = 77%
Global Time = 0.281 = 00:00:00.281 = 100%
lzpm07.exe
573,910
Process Time = 93.812 = 00:01:33.812 = 99%
Global Time = 93.875 = 00:01:33.875 = 100%
Process Time = 0.203 = 00:00:00.203 = 72%
Global Time = 0.282 = 00:00:00.282 = 100%
cabarc lzx:21
614,718
Process Time = 6.078 = 00:00:06.078 = 100%
Global Time = 6.078 = 00:00:06.078 = 100%
Process Time = 0.062 = 00:00:00.062 = 132%
Global Time = 0.047 = 00:00:00.047 = 100%
I think I should add an exe transformer to the next release (0.0of the LZPM.
If you're for or against such addition, let me know.
Why just E8? I made a few tests, E8/E9 can provide a little bit higher compression on some EXE files, but on non-executable data it hurts compression a little bit and more than just E8. Also on some executables just E8 initially provides a higher compression. In short, E8 hurts only a little in baddest case.
![]()
have you tried exe transformation from cabarc? maybe it will be better. it has some heuristic to avoid tranformation on non- executable data, eg. if relative offset is bigger than 12345678 or smaller than -12345678 then don't transform it to absolute offset. details are in cabsdk docs.
For!Originally Posted by encode
![]()
I have exactly the same here!Originally Posted by donkey7
![]()
Anyway, if you pass it across non-executable data youll see some tiny loss.Originally Posted by donkey7
maybe add some threshold, eg. if there was > 50 % of failures (non converted offstes) on last 50 e8 sequences then disable e8 transformer completely for the moment.
Actually, current e8 works pretty well. The loss in many cases just a few bytes - on text files there is even no difference, since 0xe8 not fits in ASCII char-set. I wont add some extra analyzing to the filter since I want to keep MAX speed. I already tested it on ENWIK8 - at almost the decompression speed stayed untouched. In addition, this filter is more cleverly implemented than say QUAD's one. For example:
int &addr = *(reinterpret_cast<int *>(&buf[i]));
After, we can make with addr what we want, instead of keeping local variable, modifying and writing new value back.
The only problem is that in some cases additional E9 processing can give extra compression - especially on executables with a large amount of code - like doom3.exe and photoshop.exe. But I oriented on large file sets with mixed content - in this case single E8 works better anyway.
![]()
Tested decompression with ENWIK9 - the penalty is less than one second on my PC! Note that if we will decompress executables, the decompression speed be even higher (do you remember the LZ77 rule - higher compression == more matches == faster decompression). I already included this filter to the LZPM. Even junkies from Microsoft inserted such toy to their LZX!
acrord32.exe:
LZPM 0.08: 1,481,357 bytes
mso97.dll:
LZPM 0.08: 1,892,075 bytes
![]()
encode
did you use my approach?
if you want max speed you can use following strategy:
- process 4 kb of data with exe filter,
- if number of failures are lower than threshold process further,
- otherwise suspend exe filter for 20 kb of data (do not try to transform this data) and restart filter after that,
note that threshold ive mentioned (50 %) is an ad hoc value. probably it should be way lower (or higher, i dont know).
i guess you wanted to write longer matchesOriginally Posted by encode
![]()
I did. Some time ago I tried at almost ALL possible variants. Current approach is one of the best so far!Originally Posted by donkey7
By the way, try new LZPM:
http://www.encode.su/lzpm/index.htm
![]()
these "junkies" invented both e8 transformation and price-optimal parsingOriginally Posted by encode