Thanx Encode!![]()
Thanx Encode!![]()
Creating archive: d:a.arc using lzpm4k
Compressed 1 file, 28.183.463 => 4.429.857 bytes. Ratio 15.7%
Compression time 148.36 secs, speed 190 kb/s. Total 149.66 secs
Creating archive: d:a.arc using lzpm8k
Compressed 1 file, 28.183.463 => 4.397.292 bytes. Ratio 15.6%
Compression time 246.85 secs, speed 114 kb/s. Total 249.92 secs
Creating archive: d:a.arc using lzpm16k
Compressed 1 file, 28.183.463 => 4.384.432 bytes. Ratio 15.5%
Compression time 396.67 secs, speed 71 kb/s. Total 403.44 secs
Creating archive: d:a.arc using 7z
Compressed 1 file, 28.183.463 => 4.112.522 bytes. Ratio 14.5%
Compression time 92.60 secs, speed 304 kb/s. Total 95.09 secs
Creating archive: d:a.arc using ppmd:16:384mb
Compressed 1 file, 28.183.463 => 3.939.418 bytes. Ratio 13.9%
Compression time 27.05 secs, speed 1.042 kb/s. Total 31.67 secs
Some timings for LZPM 0.09 on my old Intel P3 EB (Coppermine) 750 MHz, 512MB RAM, WinME machine.
Test file is FP.log.
LZPM v0.09
Compression Time:
Kernel Time = 0.000 = 00:00:00.000 = 0%
User Time = 0.000 = 00:00:00.000 = 0%
Process Time = 0.000 = 00:00:00.000 = 0%
Global Time = 774.915 = 00:12:54.915 = 100%
Decompression Time:
Kernel Time = 0.000 = 00:00:00.000 = 0%
User Time = 0.000 = 00:00:00.000 = 0%
Process Time = 0.000 = 00:00:00.000 = 0%
Global Time = 1.882 = 00:00:01.882 = 100%
Compressed Size: 628 KB (643,807 bytes)
QUAD v1.12
Compression Time:
Kernel Time = 0.000 = 00:00:00.000 = 0%
User Time = 0.000 = 00:00:00.000 = 0%
Process Time = 0.000 = 00:00:00.000 = 0%
Global Time = 6.386 = 00:00:06.386 = 100%
Decompression Time:
Kernel Time = 0.000 = 00:00:00.000 = 0%
User Time = 0.000 = 00:00:00.000 = 0%
Process Time = 0.000 = 00:00:00.000 = 0%
Global Time = 2.728 = 00:00:02.728 = 100%
Compressed Size: 700 KB (717,207 bytes)
I will test the various versions of LZPM (4K, 8K, 16K) on the same machine and post result later.
fp.log is too small and unusual file
fp.log represents the worst case for Flexible Parsing.
How to calculate the min. distance (how far algo can look in worst case - i.e. random data) for ROLZ:
256 * TABSIZE (if we use 1-byte context)
256 * 4096 = 1048576 (1 MB) (current)
256 * 8192 = 2097152 (2 MB)
256 * 16384 = 4194304 (4 MB)
In practice, these values should be multiplied by, say, 4 or 8. For highly redundant data the actual distance is far more longer.
![]()
try to use smth more sophisticated than linear search
for example suffix trees
somewhere i read about 'sliding' suffix trees (ie. where you can not only add new symbols but also remove old ones). with such structure you can achieve linear time for building suffix tree plus linear time for parsing.
such thing would surely kick out all other lz algos.
(additionally, with suffix trees you can use matches of lengths of thousands of bytes, up to size of sliding window without speed penalty).
Some timings for LZPM on my old Intel P3 EB (Coppermine) 750 MHz, 512MB RAM, WinME machine.
Test file is FP.log.
LZPM4K
Compression Time:
Kernel Time = 0.000 = 00:00:00.000 = 0%
User Time = 0.000 = 00:00:00.000 = 0%
Process Time = 0.000 = 00:00:00.000 = 0%
Global Time = 581.403 = 00:09:41.403 = 100%
Compressed Size: 635 KB (650,310 bytes)
LZPM8K
Compression Time:
Kernel Time = 0.000 = 00:00:00.000 = 0%
User Time = 0.000 = 00:00:00.000 = 0%
Process Time = 0.000 = 00:00:00.000 = 0%
Global Time = 869.416 = 00:14:29.416 = 100%
Compressed Size: 631 KB (646,874 bytes)
LZPM16K
Compression Time:
Kernel Time = 0.000 = 00:00:00.000 = 0%
User Time = 0.000 = 00:00:00.000 = 0%
Process Time = 0.000 = 00:00:00.000 = 0%
Global Time = 1175.059 = 00:19:35.059 = 100%
Compressed Size: 630 KB (645,864 bytes)
What file do you suggest?Originally Posted by Bulat Ziganshin
Which test file did you use for the results above?
text file of 10 mb at least - sources, natural text and so on. ive used sources of ghc. you can download smth alike as http://www.haskell.org/ghc/dist/6.4/ghc-6.4-src.ta r.bz2Originally Posted by LovePimple
Thanks Bulat!![]()
I have downloaded the archive (http://www.haskell.org/ghc/dist/6.4/ghc-6.4-src.t a r.bz2) but still cant find the file of 28.183.463 bytes that you used in your test above.I'm sure you will understand that there is little point posting results if we don't all have access to the same test files.
![]()
but you don't asked me to give my file. it's http://haskell.org/bz/ghc-src.7z
here results for my testset (45596394 bytes in 14 files):
version / size / comp / decomp speed (kb/s)
0.09 / 19543628 / 280 / 6500
0.09 4k / 19515844 / 297 / 6092
0.09 8k / 19462255 / 224 / 6289
0.09 16k / 19431576 / 165 / 5919
7-zip 4.52b (ultra, word 64, lzma only) / 17901244 / 677 / 9119
![]()
Just made a mistake with a new parsing scheme, which leads to some compression loss, sometimes notable loss (look at fp.log results).
Results with fixed scheme:
world95.txt: 575,958 bytes
fp.log: 631,821 bytes (617 KB!)
In addition, I have an idea how do further improve parsing... Will make some experiments...![]()
Thanks Bulat!Originally Posted by Bulat Ziganshin
I will download it asap.
![]()
Some timings for LZPM on my old Intel P3 EB (Coppermine) 750 MHz, 512MB RAM, WinME machine.
Test file is ghc-src.
LZPM4K
Compression Time:
Kernel Time = 0.000 = 00:00:00.000 = 0%
User Time = 0.000 = 00:00:00.000 = 0%
Process Time = 0.000 = 00:00:00.000 = 0%
Global Time = 102.605 = 00:01:42.605 = 100%
Compressed Size: 4.22 MB (4,429,856 bytes)
LZPM8K
Compression Time:
Kernel Time = 0.000 = 00:00:00.000 = 0%
User Time = 0.000 = 00:00:00.000 = 0%
Process Time = 0.000 = 00:00:00.000 = 0%
Global Time = 174.498 = 00:02:54.498 = 100%
Compressed Size: 4.19 MB (4,397,291 bytes)
LZPM16K
Compression Time:
Kernel Time = 0.000 = 00:00:00.000 = 0%
User Time = 0.000 = 00:00:00.000 = 0%
Process Time = 0.000 = 00:00:00.000 = 0%
Global Time = 288.412 = 00:04:48.412 = 100%
Compressed Size: 4.18 MB (4,384,431 bytes)
QUAD v1.12
Compression Time:
Kernel Time = 0.000 = 00:00:00.000 = 0%
User Time = 0.000 = 00:00:00.000 = 0%
Process Time = 0.000 = 00:00:00.000 = 0%
Global Time = 21.573 = 00:00:21.573 = 100%
Compressed Size: 4.46 MB (4,678,053 bytes)
QUAD v1.12 (-x)
Compression Time:
Kernel Time = 0.000 = 00:00:00.000 = 0%
User Time = 0.000 = 00:00:00.000 = 0%
Process Time = 0.000 = 00:00:00.000 = 0%
Global Time = 50.641 = 00:00:50.641 = 100%
Compressed Size: 4.29 MB (4,505,182 bytes)
Thanks Bulat!This is an excellent test file.
If we can get several more people posting results from this same test file, it will give us some idea as to whether the 16K is worth it or not.![]()
much fatser than my 1 ghz duron, probably due to 512kb cache and better cache organization
Here is my CPU spec (from CPU-Z v1.40.5):Originally Posted by Bulat Ziganshin
Processor 1 (ID = 0)
Number of cores 1
Number of threads 1 (max 1)
Name Intel Pentium III EB
Codename Coppermine
Specification
Package Socket 370 FC-PGA (platform ID = 4h)
CPUID 6.8.6
Extended CPUID 6.8
Brand ID 2
Core Stepping cC0
Technology 0.18 um
Core Speed 747.8 MHz (7.5 x 99.7 MHz)
Stock frequency 1000 MHz
Instructions sets MMX, SSE
L1 Data cache 16 KBytes, 4-way set associative, 32-byte line size
L1 Instruction cache 16 KBytes, 4-way set associative, 32-byte line size
L2 cache 256 KBytes, 8-way set associative, 32-byte line size
FID/VID Control no
The machine originally came with a 600 MHz Celeron chip which gave performance that was little more than a joke. I replaced it with a second hand 1 GHz Pentium III chip (7.5 x 133 MHz). The motherboard has a maximum bus speed of only 100 MHz so the CPU runs at 750 MHz (7.5 x 100 MHz). The performance is now MUCH FASTER and far, far more reliable.
Im sure it would make a BIG difference to performance if you were to install a 1 GHz (or better) Athlon chip in that machine of yours.
Results on my machine, newer LZPM 0.10:
ghc-src: 4,406,897 bytes, 17 sec
Hm, will perform some optimizations...
LZPM from test pack (LZPM 4K):
ghc-src: 4,429,856 bytes, 15 sec
![]()
oh, yes, 256kb. and afair EB series should have 133 mhz bus?Originally Posted by LovePimple
Correct!Originally Posted by Bulat Ziganshin
See my notes after the CPU spec.
http://www.buildorbuy.org/p3-ram.htmlThe "E" and "B" designators distinguish between Intel Pentium III processors with the same core frequency but different system bus frequencies and/or cache implementations.
B = 133 MHz System Bus
E = Processors with "Advanced Transfer Cache" (CPUID 068x and greater only if a frequency overlap exists)
differences between duron and athlon is much less than between celeron and p3 processors (and it is reason why duron was so popular 5 years ago)Originally Posted by LovePimple