fcm2 - catch me if you can!
What's new:
+ Added dynamic mixer
+ Switched to pseudo order-2 (12-bit context)
+ Fixed wrong definition of _CRT_DISABLE_PERFCRIT_LOCKS
+ Optimized compile
fcm2.zip
fcm2 - catch me if you can!
What's new:
+ Added dynamic mixer
+ Switched to pseudo order-2 (12-bit context)
+ Fixed wrong definition of _CRT_DISABLE_PERFCRIT_LOCKS
+ Optimized compile
fcm2.zip
Thanks Ilia!
Mirror: Download
http://shelwien.googlepages.com/cm200805.htm updated again
Ah, btw.
Please use c/d commands instead of e/d in next fpaq-like coders.
Because I had to patch out the argv[1] checks in fcm1 and fcm2
original executables as it was easier than making a separate test
script for it.
Well, add "-" there or something.
I mean ( (argc!=4) || ((-argv[1][0]&254)!=156) )
I also prefer c/d.Originally Posted by Shelwien
Here's my own speed optimised compile for Pentium 3 or later processors. This version also includes Shelwien's mod.
ENJOY!
http://shelwien.googlepages.com/cm200805.htm updated again
No luck
Thanks Shelwien!
I'm surprised that its not as quick as the original compile on your machine. Compression of ENWIK8 was several seconds faster on my Sempron 2400+ machine.
Quick test on my Pentium 3 @750 Mhz, Windows 2000 SP4 machine...
Test File: ENWIK8 (100,000,000 bytes)
fcm2
Compressed Size: 41,157,683 bytes
Compression Time: 147.17 Seconds
00 Days 00 Hours 02 Minutes 27.17 Seconds
fcm2p3
Compressed Size: 41,157,683 bytes
Compression Time: 137.14 Seconds
00 Days 00 Hours 02 Minutes 17.14 Seconds
I beleive such results are possible only on old-school machines. On modern PCs the results are opposite:
ENWIK8:
fcm2 - 22 sec.
fcm2p3 - 24 sec.
And again looking at processing time on your machine I'm crying...
MC SFC test...
A10.jpg > 835,804
AcroRd32.exe > 1,807,692
english.dic > 973,857
FlashMX.pdf > 3,883,671
FP.LOG > 4,502,965
MSO97.DLL > 2,139,028
ohs.doc > 1,253,019
rafale.bmp > 1,036,125
vcfiu.hlp > 1,219,774
world95.txt > 1,232,676
Total = 18,884,611 bytes
I hope to see more people testing this compressor.
A small update - fcm2a - added an extra context to the mixer:
http://encode.su/downloads/fcm2a.zip
Thanks Ilia!
These are redundant. Just remember the mixer's outputs within the predictor and pass them to the mixer's update functions.Code:class TMixer { private: int wt; int x1; int x2;
And your weights should be unsigned bytes or words - i'm sure there is no speed penalty.
I'm just too lazy...
I already did it anyway.
Here's my own speed optimised compiles. One for Pentium Pro or later, the other for P3 or later.
ENJOY!
Here's my own speed optimised compile of Shelwien's "fcm2sh4". One for Pentium Pro or later, the other for P3 or later.
ENJOY!
Thank you!
Interpretation of this type of mixer:
dH/dw = d/dw -ln( (1-y) + (2*y-1)*P1) )Code:void Update(int y) { if (y?x2>x1:x2<x1) // y==1 && p2-p1>0 || y==0 && p2-p1<0 -> sgn(2*y-1)*sgn(p2-p1) > 0 wt+=(wt<63); else // y==1 && p2-p1<0 || y==0 && p2-p1>0 -> sgn(2*y-1)*sgn(p2-p1) < 0 wt-=(wt>32); }
= 1/((not y) - P1) * (p2-p1)
with
P1 = (1-w)*p1 + w*p2 = p1 + (p2-p1)*w
The weight update in conjugate gradient direction is:
w(k+1) = w(k) + d
w(k+1) = w(k) - c * dH/dw, 0 < c < 1 (step length)
So the "direction" d is given by:
sgn(-dH/dw) = sgn(p2-p1) / sgn(P1 - (not y)) = sgn(p2-p1)*(2*y-1)
sgn(p2-p1)*(2*y-1) > 0 -> y==1 && p2-p1>0 || y==0 && p2-p1<0 -> d > 0
sgn(p2-p1)*(2*y-1) < 0 -> y==1 && p2-p1<0 || y==0 && p2-p1>0 -> d < 0
This mixer approximates -c*dH/dw as sgn(p2-p1)*(2*y-1).
During some communication with encode i showed some code for this and a approximation:
Why not try CG methods, which require to save the last d? Or do a blockwise optimization, which should help to cut down fluctuations.Code:// Update Using gradient descent inline void Update( const int y ) { const int d = bitmodel::SCALE-(y<<bitmodel::LD_SCALE)-pm>>8; if (abs(d)!=0) { const int dw = 6*(pb-pa<<8)/d + (1<<12)>> 13; int wy = w[0]-dw; if (wy<=0) wy=1; else if (wy>=W) wy=W-1; w[0] = w[0]*5 + wy*11>>4; } } template<typename T> T BSR( T x ) { asm volatile ( "bsr %0,%0\n" : "+r" (x):: "cc" ); return x; } // Update using a gd approximation inline void Update( const int y ) { sint16 d = bitmodel::SCALE-(y<<bitmodel::LD_SCALE)-pm>>1; if (abs(d)>128) { sint16 n = pb-pa>>1; const int dir = n>0 ? 1-2*y : 2*y-1; d = BSR<uint16>(abs(d)); n = BSR<uint16>(abs(n)); int s = n-d+6; if (s<2) { return; } const int wy = (11+5)*w[0] - 11*(dir<<s)>>4; if (wy<=0) { w[0]=1; } else if (wy>=W) { w[0]=W-1; } else { w[0]=wy; } } }
Any more comments? This update method used in conjunction with my mixer showed good results - but its overall performance was worse (both speed and compression-wise). Even the approximation was slower than the original one - looks like processors don't like these bit string ops.
The logic is much simpler there actually...
That update is just shifting the weight in the direction of the submodel
which has better prediction, and it also has a limit (in the form of wt>32), which
blocks order0 from getting too much weight.
So, in practice, it is able to set the weight to 0.5 when order0 is better in a long run,
and to 63/64 when order1 is better, and that's what basically matters in most cases.
The result would be probably similar even if we won't adaptively mix anything there,
but just switch to one or another extreme case based on (wt<4 or something.
Well, what I want to say is that there's no reason to measure the theoretical
performance of this method because its unrelated to Shannon metric and
obviously wouldn't be good eg. in a situation with equally applicable submodels
or with just any random two inputs - btw, that's probably why the counters in
the pairs are not adaptively mixed there.
Well, actually I do not adaptively mix counters due to speed restrictions - I don't want loose too much speed. In addition, such counters are good even with static mixing - it looks like just an advanced 17-bit precision counter.
Theoretically, such type of mixer might work with two counters - if we tune the number of points, thresholds, ... don't know how well it will perform, however.
Just for fun I'll try to do that...
Better test what I said first, I mean,
Maybe try other boundary values in switch condition too.Code:int PP(int p1, int p2) { x1=p1; x2=p2; if( wt<48 ) { return ((p1<<6)+((p2-p1)*32))>>11; } else { return ((p1<<6)+((p2-p1)*63))>>11; } }
These bounding of the weights and the fast "switching" behaviour is simply a matter of such a simple implementation.
It still doesn't change the fact that it can be viewed as an approximation of coding cost minimization (even if it wasn't intended by the implementation) - i just wanted to point this out and encourage encode to do some experiments with more advanced mixing.
OK, I'll test the idea about straight switching later - just have no time.
Maybe smooth switching will be supreme - since we balance two counters.
I briefly added mixer to each counter, no tunings, same mixer:
sfc.7z
fcm2a - 18,893,176 bytes
fcm2b - 18,801,732 bytes
MPTRACK.EXE
fcm2a - 576,038 bytes
fcm2b - 561,344 bytes
...
That means such mixer somehow works with counters as well...
And this' without any tuning...