-
The Founder
What's new:
+ Added fully-featured order-2-0 PPMC coder
Link:
Download TC 5.0dev6 (28 KB)
-
-
The Founder
It's just DRAFT-test version. In next versions I'll:
+ Improve literal/match length coding mechanism, leading higher compression
+ Speed-up the hashtable
-
-
The Founder
TC 5.0dev6 on Calgary Corpus:
bib: 30,729 bytes
book1: 263,369 bytes
book2: 177,833 bytes
geo: 62,232 bytes
news: 122,664 bytes
obj1: 10,747 bytes
obj2: 79,953 bytes
paper1: 17,560 bytes
paper2: 27,173 bytes
pic: 53,819 bytes
progc: 13,110 bytes
progl: 15,541 bytes
progp: 10,830 bytes
trans: 17,318 bytes
total: 902,878 bytes
-
-
The Founder
TC 5.0dev6 on SFC (maximumcompression.com):
A10.jpg: 859,025 bytes
acrord32.exe: 1,692,298 bytes
english.dic: 844,988 bytes
FlashMX.pdf: 3,810,493 bytes
fp.log: 677,336 bytes
mso97.dll: 2,058,718 bytes
ohs.doc: 866,934 bytes
rafale.bmp: 1,092,785 bytes
vcfiu.hlp: 731,924 bytes
world95.txt: 661,748 bytes
total: 13,296,249 bytes
-
-
The Founder
TC 5.0dev6 on Canterbury Corpus:
alice29.txt: 48,148 bytes
asyoulik.txt: 42,770 bytes
cp.html: 7,855 bytes
fields.c: 3,120 bytes
grammar.lsp: 1,232 bytes
kennedy.xls: 129,290 bytes
lcet10.txt: 122,069 bytes
plrabn12.txt: 162,492 bytes
ptt5: 53,819 bytes
sum: 13,226 bytes
xargs.1: 1,747 bytes
total: 585,768 bytes
-
-
The Founder
TC 5.0dev6 on Large Text Compression Benchmark:
ENWIK8: 29,544,971 bytes
ENWIK9: 257,416,397 bytes (c 279 sec/d 279 sec)
P4 3.0 GHz, 1 GB RAM, Windows XP SP2
-
-
Thanks!
Will this version will perform better on the SFC test.
-
-
The Founder
With no doubt, this version have higher compression compared to TC 5.0dev5.
-
-
My previous post should have read:
Will this version will perform better on the MFC test.
Apologies for the error!
-
-
The Founder
...99,9 percent it will!
Let's wait and see!
-
-
OK!
-
-
The Founder
Well, for a few days of 24-hour testing and experimenting, I've found current TC 5.0dev6 is good enough - it can achieve higher compression with larger PPM hashtable and/or with different scaling, but difference will be about 1...2% or less, if at all. So, now I just collecting ideas...
-
-
The Founder
Results for TC 5.0dev6x on 'enwik' files. Note compared to dev6 this version uses 16 MB PPM hashtable instead of 4 MB.
<u>ENWIK8</u>
TC 5.0dev6x: 28,990,965 bytes
TC 5.0dev6: 29,544,971 bytes
<u>ENWIK9</u>
TC 5.0dev6x: 251,876,150 bytes
TC 5.0dev6: 257,416,397 bytes
Also note, in some cases a larger hashtable can provide lower compression. It's due to a smaller hashtable can efficiently drop 'outdated' contexts, unlike large one.
-
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules