OK, let me introduce some encoder. Continuing fpaq0f2-like, experimental encoders, I made a new one - check out my variant of an order-1 cm. Source code included:
fcm1.zip
OK, let me introduce some encoder. Continuing fpaq0f2-like, experimental encoders, I made a new one - check out my variant of an order-1 cm. Source code included:
fcm1.zip
Brief description:
Each counter consists from a two states a la FPAQ0P - fast and slow one, which are summarized together.
For mixing a probabilities from order-1 and order-0 we use followed formula:
((p2*3)+p1)/4
p2 - Probability from an order-1 context
p1 - Probability from an order-0 context
If:
p2 = 0.5
i.e. order-1 context for the first time occurs, we use probability from an order-0 only.
Any thoughts?
EDIT:
In final version, we use:
((p2*15)+p1)/16
Thanks Ilia! I have uploaded it to my site along with my own speed optimised compile.
Mirror: Download
One can't generalize this, but you give far too much weight to order 0. The 2d SSE visualizations (SSE for mixing) in the old form showed that the SSE function mostly looked like a plane, which was dominated by the order 1 model. That means, order 0 gave almost no contribution.
How much does your slow counter/fast counter mix improve over a standalone counter?
Updated archive with a new compile which is at almost 2X times faster!
Hm as i said giving order 0 such a high weight is totally wrong:
-> 22.459.785 bytes (plain order1) vs. 23.783.905 bytes (orders01) on SFC
I just changed this:
Code:// int P() { // int p1=counter0[c0].P(); // int p2=counter1[c1][c0].P(); // // return (p2!=(1<<16)?((p2*3)+p1)>>7:p1>>5); // } int P() { return counter1[c1][c0].P()>>5; }
I'm not so sure. It's a matter of tuning for specific file type and specific situation...
Static weights form PAQ1 are: 4, 9, 16, ...
sfc.7z:
p2*3: 23,789,646 bytes
p2*7: 22,869,851 bytes
p2*15: 22,535,420 bytes
plain order-1: 22,487,664 bytes
p2*31: 22,425,582 bytes
p2*63: 22,403,145 bytes
...
ATM you exactly confirm what i said - "order 0 such a high weight is totally wrong" and your results get better the higher you weighted order 1...
Actually, it's true only for such "simple" and light version. BALZ uses similar but more heavy weight and complex CM encoder, and there weight=3 is the best so far. Also, for such simple model we may try to tune counters speed - i.e. not 3/5 but 3/6, 4/6, etc...
Counter tuning is not very exciting, believe me - i'm doing it the whole time :/ But you can really squeeze out additional performance. As i said the gain is somewhere between .1 - 1 % for my counter in its simplest form (over ">>x-counters").
If you refer to "shift right by 3 or 5 bits", well, that's to croase. Simply changing the weight of old statistics from 7/8 (=shr 3) to, e.g. 15/16 (=shr 4) (these are fractions) leaves several quantization levels untouched.
However good results will require at least one multiplication per bit model adaption (see my fpaq0 variant), which will make things slower.
EDIT: BTW, is your weighting in BALZ static?
Have a look at the parameter optimization thread:
http://encode.su/forum/showthread.php?t=18
EDIT: meanwhile i found better parameters for enwik8; since i use an iterative optimization the optimal result depends on the initial estimate. If you want, i can post them next week (left at university, i'm at home right now)
Last edited by toffer; 23rd May 2008 at 23:42.
Thanks Ilia, and toffer! I have updated my archive.
Mirror: Download
And as a final update - I again updated the archive - this time I set weights tuned for enwik8. For enwik the best value is 15.
I'll wait a while before I update my archive again!
I have updated my archive again. All files are included.
Mirror: Download
I modified fpaq0f2 to pure order 1.
enwik8 43,898,745, 45 sec, 46 sec
enwik9 413,824,018, 491 sec
(compare with fcm1: 45,402,225, 23, 26 sec; 447,305,681, 228, 261 sec).
Predictor code is below. I didn't post the whole program but you can just paste this into fpaq0f2.cpp. Rest of program is unchanged.
Recall that fpaq0f2 maps each context to an 8 bit history, them maps that along with the context to a prediction using a StateMap. To make it order 1, the StateMap size is increased to 2^24 (8 bit history, 16 bit context) using 64 MB memory. This makes it twice as slow due to cache misses. The only other change was to tune the StateMap limit to 255 (was 95).
Yeah, I know, is it really order 1? I think so because there are 64K counters and I am modeling whether each one is stationary or nonstationary.
Code:class Predictor { int cxt; // Context: 0=not EOF, 1..255=last 0-7 bits with a leading 1 int c1; // last byte << 8 StateMap sm; unsigned char state[0x10000]; public: Predictor(); // Assume stream of 9-bit symbols int p() { return sm.p(c1+cxt<<8|state[c1+cxt]); } void update(int y) { sm.update(y, 255); (state[c1+cxt]*=2)+=y; if ((cxt+=cxt+y) >= 256) { c1=cxt<<8&0xff00; cxt=0; } } }; Predictor::Predictor(): cxt(0), c1(0), sm(0x1000000) { for (int i=0; i<0x10000; ++i) state[i]=0x66; }
Thanks a lot Matt!
The only purpose of fcm1 is to test the idea of mixing the probabilities from different orders straight, without bit counts, etc. As you can see it has a nice compression/speed balance since it is still faster than initial fpaq0.
Why don't you add some blockwise optimization there?
I mean, you can store the probability sets for eg. 64k blocks,
then find the optimal weights and store them in the block header.
Like that, decoding would have the same speed as now, but
compression might significantly improve.
Well, it's about another idea. Here I just testing some thoughts...
The second thing I will try is an adaptive weighing - i.e. each time check what kind of order was more correct - i.e.:
During update:
A dummy example from the top of my head. Of course weighs may be context sensitive etc. Simple and fast approach.Code:if (y) { if (p2>p1) wt2++; else wt1++; } else { if (p2<p1) wt2++; else wt1++; }
Such statistics (number of occurences when context's prediction was the best) are used in ppmy/ash where they're called "optimals".
So, I found that its another useful value for SSE context (along with escape counts etc), but even an optimized weighting function based on these values
is worse than a proper adaptive mixer... and slower too.
But still you may check out the weighting function in ash's source.
I doubt about that is slow. Check this out:
Code:// wt is our weigh for counters balance - ranging from 1 to 63, if we use 64 points int w1=64-wt; int w2=wt; return (((p1*w1)+(p2*w2))>>X);
Well, I meant _this_ is slow:
Code:inline float Weighten( Node& p, // context node int Ord, // context order int N // symbols in context ) { int& CurOrder = Order; // model order int i = __min(CurOrder,MY-1); // table limits int j = __min(Ord,MX-1); int k = N==1; // determinated context? float WEsc,WOpt,WInv; WOpt = WO[k][i-1][j]; // static parameters WEsc = WE[k][i-1][j]; // WA[] and WB[] are the same WInv = WA[k][i][j]*p.W + WB[k][i][j]; // p.W/p.T is escape probability WInv*= (WOpt+p.O); // p.O/p.Q is best prediction probability WInv/= (WEsc+p.Q)*(WOpt+p.T); if( WInv<0 ) WInv=0; return WInv; }
Briefly tested the idea:
sfc.7z:Code:#define WSCALE 128 // ... // initially wt may be initialize to WSCALE/4 or so. TPredictor(): wt(WSCALE/4), c0(1), c1(0) {} int P() { int p1=counter0[c0].P(); int p2=counter1[c1][c0].P(); int w1=WSCALE-wt; int w2=wt; return (((p1*w1)+(p2*w2))>>12); } void Update(int y) { int p1=counter0[c0].P(); int p2=counter1[c1][c0].P(); if (y) { if (p2>=p1) wt+=(wt<(WSCALE-1)); else wt-=(wt>(WSCALE/4)); } else { if (p2<=p1) wt+=(wt<(WSCALE-1)); else wt-=(wt>(WSCALE/4)); } counter0[c0].Add(y); counter1[c1][c0].Add(y); if ((c0+=(c0+y))>=256) { c1=(unsigned char)c0; c0=1; } } // ...
fcm1: 22,535,420 bytes
fcm1w: 22,246,539 bytes
On ENWIKs this trick gives no improvement, since it's about stationary nature and order-1 working pretty correct in this case. Well, maybe if we will use weight with some context...
Damn, of course for weight context we SHOULD use at least c0 i.e. bits of a current byte!
sfc.7z:
22,181,086 bytes
And with 64 points, sfc.7z:
22,164,766 bytes
Now we may separate the entire mixing stuff in a TMixer class... Also I will do some experiments with weight update.