Originally Posted by encode
16K + 32MB bufferOriginally Posted by encode
I think compression speed is not so important as decompression.
Originally Posted by encode
16K + 32MB bufferOriginally Posted by encode
I think compression speed is not so important as decompression.
Originally Posted by encode
Originally Posted by Squxe
Most people have at least 1GB RAM these days.I vote for the 64MB buffer.
OK, will keep 16K index and 64 MB buffer as a base. At least all benchmarkers have 1 GB RAM. LZPM goes large!
Now I will play with hashing and string searching - have a few ideas.
Malcolm Taylor in his ROLZ2 uses Hash Tables for Fast mode and Binary Tree for higher ones.
Hash Tables have fast update, at the same time the search is poorly bounded - that's why it's not efficient with optimal parsing.
Binary Tree is better with optimal parsing since the string search becomes faster. Just currently I have no idea how exactly use binary trees for string searching. Probably I should look at LZMA sources.
![]()
Thanks Ilia!Originally Posted by encode
![]()
Some results with new LZPM:
ENWIK8: 26,513,271 bytes
ENWIK9: 229,206,234 bytes
gimp-2.0.0.tar: 9,477,271 bytes
Memory needed for compression: 512 + 128 + 64 = 704 MB
![]()
The debug-pre-release of LZPM 0.11 is here:
lzpm011dbg.zip (47 KB)
(only for personal testing not outside this forum)
Just a powerhouse-like release. If you'll like it, very soon I finally release the v0.11!
Check the fast modes - IT'S SICK!!
Happy testing!![]()
Thanks Ilia!![]()
I'm sure this release will be a true PowerHouse!!!![]()
Very quick test with FP.log...
(1) 778 KB (796,905 bytes)
Elapsed Time: 00:00:38.14 (38.14 Seconds)
(736 KB (754,389 bytes)
Elapsed Time: 00:02:57.64 (177.64 Seconds)
Good compression speed!![]()
Quick test with SFC at lowest (1) setting...
A10.jpg > 832,155
AcroRd32.exe > 1,661,229
english.dic > 1,043,341
FlashMX.pdf > 3,768,997
FP.LOG > 796,905
MSO97.DLL > 2,030,758
ohs.doc > 839,929
rafale.bmp > 1,126,286
vcfiu.hlp > 737,812
world95.txt > 644,182
Total = 13,481,594 bytes
Compression is still good even at the lowest setting.
I'll run more tests tomorrow.
More tests...
LZPM (1)
A10.jpg > 832,155
AcroRd32.exe > 1,661,229
english.dic > 1,043,341
FlashMX.pdf > 3,768,997
FP.LOG > 796,905
MSO97.DLL > 2,030,758
ohs.doc > 839,929
rafale.bmp > 1,126,286
vcfiu.hlp > 737,812
world95.txt > 644,182
Total = 13,481,594 bytes
LZPM (2)
A10.jpg > 832,164
AcroRd32.exe > 1,642,634
english.dic > 1,022,625
FlashMX.pdf > 3,761,541
FP.LOG > 776,765
MSO97.DLL > 2,013,842
ohs.doc > 835,722
rafale.bmp > 1,102,195
vcfiu.hlp > 726,469
world95.txt > 613,538
Total = 13,327,495 bytes
LZPM (3)
A10.jpg > 832,173
AcroRd32.exe > 1,634,885
english.dic > 1,015,535
FlashMX.pdf > 3,756,691
FP.LOG > 768,875
MSO97.DLL > 2,009,222
ohs.doc > 835,194
rafale.bmp > 1,092,312
vcfiu.hlp > 710,894
world95.txt > 598,423
Total = 13,254,204 bytes
LZPM (4)
A10.jpg > 832,173
AcroRd32.exe > 1,633,323
english.dic > 1,010,246
FlashMX.pdf > 3,755,575
FP.LOG > 764,299
MSO97.DLL > 2,007,172
ohs.doc > 834,961
rafale.bmp > 1,089,177
vcfiu.hlp > 710,058
world95.txt > 592,372
Total = 13,229,356 bytes
LZPM (5)
A10.jpg > 832,173
AcroRd32.exe > 1,631,500
english.dic > 1,000,441
FlashMX.pdf > 3,754,544
FP.LOG > 761,176
MSO97.DLL > 2,004,878
ohs.doc > 834,432
rafale.bmp > 1,087,340
vcfiu.hlp > 705,505
world95.txt > 589,635
Total = 13,201,624 bytes
LZPM (6)
A10.jpg > 832,173
AcroRd32.exe > 1,630,453
english.dic > 989,329
FlashMX.pdf > 3,753,560
FP.LOG > 756,963
MSO97.DLL > 2,003,865
ohs.doc > 834,205
rafale.bmp > 1,083,531
vcfiu.hlp > 703,909
world95.txt > 588,135
Total = 13,176,123 bytes
LZPM (7)
A10.jpg > 832,173
AcroRd32.exe > 1,629,416
english.dic > 979,415
FlashMX.pdf > 3,753,148
FP.LOG > 754,918
MSO97.DLL > 2,002,878
ohs.doc > 834,104
rafale.bmp > 1,078,426
vcfiu.hlp > 701,594
world95.txt > 586,875
Total = 13,152,947 bytes
LZPM (
A10.jpg > 832,173
AcroRd32.exe > 1,629,139
english.dic > 973,003
FlashMX.pdf > 3,752,812
FP.LOG > 754,389
MSO97.DLL > 2,002,410
ohs.doc > 833,986
rafale.bmp > 1,075,903
vcfiu.hlp > 700,590
world95.txt > 585,630
Total = 13,140,035 bytes
LZPM (9)
A10.jpg > 832,173
AcroRd32.exe > 1,626,230
english.dic > 962,032
FlashMX.pdf > 3,750,555
FP.LOG > 614,714
MSO97.DLL > 2,000,508
ohs.doc > 828,583
rafale.bmp > 1,056,836
vcfiu.hlp > 687,544
world95.txt > 573,036
Total = 12,932,211 bytes
Awesome compressor!
Good work Ilia!![]()
I've found that this 5-byte string match finder is not efficient in terms of binary compression - we loose many short matches. So, I decide to use an old 4-byte string searching. This leads to higher binary and slightly higher text compression at the cost of extra compression time (a few times slower). It took a few hours to compress ENWIK9 on my PC...However, the max is max, right? OK, some results with new modification:
ENWIK8: 26,498,178 bytes
ENWIK9: 229,062,912 bytes
gimp-2.0.0.tar: 9,466,969 bytes
On small and binary files the gain is much more notable.
![]()
Right!Originally Posted by encode
![]()
Originally Posted by README.TXT
![]()
When releases version 0.11 ?
The LZPM 0.11 is at almost written - now I'm just testing it. It will released within one week. Maybe even tomorrow - it depends on my tests. Anyway, the release is very soon!![]()
Have an idea for hash chains improvement.
The data structures are accesed in followed manner:
head -> prev -> prev -> prev ...
i.e. we start from "head", after we traverse backwards thru "prev".
To the head we have a random access (defined by a hash).
To the prev we access sequentially, but backwards. The idea is to make a cheap trick, so we able to traverse towards, which assumed to be faster. To do so, we just invert each "pos":
pos = prev[N - pos];
So each time we will go forward, this may lead to a faster memory access.
The simple, isn't it? Will test such idea tomorrow.![]()
its random access anyway. forward/backward access may have different speed when its *sequential*Originally Posted by encode
Indeed, this gives no improvement. At least tested the idea...![]()
Some time ago, Uwe proposed an idea for parsing improvement:
With my FP, each time we compare the combination of two matches:
a + b
with
shorter a + b
if a + b = (MAXMATCH * 2) - 1
we can not outperform such value inside the parsing scheme, so we just eliminate the search.
My idea if b == MAXMATCH, then we can not outperform current combination and we completely eliminate further optimization.
The case a + b = (MAXMATCH * 2) - 1 is not frequent, on some files such thing will never happen. b == MAXMATCH is far more frequent - as a result we can skip many match searches, providing the SAME compression.
Such thing already added to the LZPM 0.11. However, even with such improvement LZPM with "9" is damn slow. Anyway, with such technique we can save some time with no compression loss!
![]()
In practice, we can calculate such upper limit dynamically, which can be less than MAXMATCH, getting further speed up, especially on highly redundant files like fp.log.![]()