Let us consider following code:
Code:
```class TPredictor {
private:
unsigned short pr;

public:
TPredictor(): pr(2048) {}

int P() {
return (pr);
}

void Update(int y) {
if (y)
pr+=((4095-pr)>>5);
else
pr-=(pr>>5);
}
};```
This is a FPAQ0P-styled predictor that predicts the probability of a one bit.
And here is a modification a la FPAQ0M:
Code:
```class TPredictor {
private:
unsigned short p1;
unsigned short p2;

public:
TPredictor(): p1(2048), p2(2048) {}

int P() {
return ((p1+p2)>>1);
}

void Update(int y) {
if (y) {
p1+=((4095-p1)>>3);
p2+=((4095-p2)>>6);
}
else {
p1-=(p1>>3);
p2-=(p2>>6);
}
}
};```
For comparison, here is a classical example of stationary/non-stationary (semi-stationary) predictor:
Code:
```class TPredictor {
private:
unsigned char n0;
unsigned char n1;

public:
TPredictor(): n0(0), n1(0) {}

int P() {
return (((n1+1)<<12)/((n0+n1)+2));
}

if (y) {
if (++n1>=255) {
n0>>=1;
n1>>=1;
}
}
else {
if (++n0>=255) {
n0>>=1;
n1>>=1;
}
}
}

void UpdateNonStationary(int y) {
if (y) {
if (n1<255) n1++;
if (n0>2) n0=(n0>>1)+1;
}
else {
if (n0<255) n0++;
if (n1>2) n1=(n1>>1)+1;
}
}
};```
The question is ? how we can add a SSE/weighed mixing inside this class ? i.e. no additional tables, no additional classes ? all stuff inside one class. The second example needs four bytes per bit prediction; I think we may use eight bytes ? i.e. we may add one more 32-bit integer or two 16-bit integers, etc. What we possibly can:
+ Add weighed mixing of both probabilities (fast and slow ones)
+ Add weighed mixing of one or two probabilities with static model (prob. = 0.5)
+ Add kind of SSE/APM inside
Any ideas/suggestions?

2. Maybe it's a good idea to use approach similar to FPAQ0F2:
Code:
```    int n=t[cxt]&255, p=t[cxt]>>14;  // count, prediction
if (n<limit) ++t[cxt];
t[cxt]+=((y<<18)-p)*dt[n]&0xffffff00;```
In other words:
Code:
`pr+=((y<<N)-pr)*wt; // or *rate`

2. SSE incapsulation isn't a good idea, because SSE is really useful
only with good contexts.
3. A counter is just an independent model for context history.
So you can strip some example file into bit sequences, and make
a good model for compression of that and then optimize it until
it'll fit your restrictions for size and speed.

4. OK, thanks!

I just interested what's going on inside FPAQ0F2, why so called "bit history" crazily helps on enwik9, even beating FPAQ0MW?

I tested FPAQ0F2 with an order-1 coder - nothing happens, compression even worser compared to FAPQ0M. What is "bit history" - if I'm correct - we keep count of this context seen and each time increase this value by one until some defined "limit" was reached. This count affects on update speed - looks like initially the update is the most aggressive - if counter reached the limit the update is slowest. If this is a correct explanation what's inside, by itself it may NOT be called order-1 coder, however, why this thing not works if we apply such technique to an order-1 coder?

I noticed that, generally speaking, FPAQ0P-like model works poorly at order-0, but more than correctly at pure order-1. Maybe FPAQ0F2 just fixes some behavior at order-0 but completely not works with order-1?

5. Originally Posted by Encode
FPAQ0F is using additional tables - the bit history.

Originally Posted by Shelwien
SSE incapsulation isn't a good idea, because SSE is really useful
only with good contexts.
Exactly. It does not make too much sense to use SSE inside your everyday counter - you should use it to e.g. 'stabilize' the output of multiple counters or sub-models.

Originally Posted by Encode
This count affects on update speed - looks like initially the update is the most aggressive - if counter reached the limit the update is slowest. If this is a correct explanation what's inside
The explanation lacks an important detail. FPAQ0F's counter uses the bit history as an additional context. Knowing this, it's easy to explain it's better performance. Although FPAP0F is 'order-0', it's performance and memory requirements are closer to an order-1 coder (because of the bit history).

Maybe FPAQ0F2 just fixes some behavior at order-0 but completely not works with order-1?
It's obvious, at order-1 FPAQ0F shouldn't work that good - the bit histories take too long to 'warm up'.

6. Originally Posted by Christian
FPAQ0F
Important note: FPAQ0F is NOT FPAQ0F2 - check both sources...

7. Got it:
Code:
```  int p() {
return sm.p(cxt<<8|state[cxt]);
}
void update(int y) {
sm.update(y, 90);
int& st=state[cxt];
(st+=st+y)&=255;
if ((cxt+=cxt+y) >= 256)
cxt=0;
}```
Cheating indeed... While I thinked that this variable update rate helps...

8. Important note: FPAQ0F is NOT FPAQ0F2 - check both sources...
Of course it's not the same - but it's very similar. You may as well replace FPAQ0F with FPAQ0F2 in my previous text.

Additionally, you might want to check FPAQ0F2's sources for this line...

Code:
`U32 *t;       // cxt -> prediction in high 24 bits, count in low 8 bits`

9. Originally Posted by Christian
Additionally, you might want to check FPAQ0F2's sources for this line...

Code:
`U32 *t;       // cxt -> prediction in high 24 bits, count in low 8 bits`
Nothing special - we just pack our probability and context seen count into one 32-bit integer. Nothing about bit histories or tables...

10. Variable update speed really doesn't help at all (the gain is tiny, especially with lower order models). If you want to benefit from a longer bit history, use a delayed counter, as Shelwien suggested, two or three bits should be enough. Somewhere within the forum archive, there was a discussion about this. And optimized prediction/update parameters help, too, when compared against something like p += p>>8 or p>>5.

11. Nothing special - we just pack our probability and context seen count into one 32-bit integer. Nothing about bit histories or tables...
Please, just look at the sources more carefully. You can start with the param of function StateMap::p(...) with which table t is addressed. :)

12. Code:
```// Initialize assuming low 8 bits of context is a bit history.
StateMap::StateMap(int n): N(n), cxt(0) {
alloc(t, N);
for (int i=0; i<N; ++i) {
// Count 1 bits to determine initial probability.
U32 n=(i&1)*2+(i&2)+(i>>2&1)+(i>>3&1)+(i>>4&1)+(i>>5&1)+(i>>6&1)+(i>>7&1)+3;
t[i]=n<<28|6;
}
if (dt[0]==0)
for (int i=0; i<256; ++i)
dt[i]=32768/(i+i+3);
}

// ...

Predictor::Predictor(): cxt(0), sm(0x10000) {
for (int i=0; i<0x100; ++i)
state[i]=0x66;
}```

13. The predictor is called like this:

Code:
`sm.p(cxt<<8|state[cxt])`
The param is the context from which the probability is retrieved. Its low byte (state[cxt]) is the bit history and the higher bits are its 'order-0-context' (IOW, the already coded bits of the current symbol). Hope this helps.

14. Originally Posted by toffer
If you want to benefit from a longer bit history, use a delayed counter, as Shelwien suggested, two or three bits should be enough.
Checked the archive. Delayed counters seemed to be an interesting subject...

15. 1. bit history is a sequence of bits encountered with a given context
2. fpaq0f2 is pretty much an order1 coder, just with another context instead of previous byte
3. fpaq0f2 _properly_ switched to order1 should work better than a common order1 coder,
though I'm not sure whether it'd reach order2 level. Probably would.
4. Its not a matter of "warming up". When mapping is initialized the right way, a coder
would have _at_least_ the same compression as without mapping.
Basically, you make a working order1 coder first, then add a _static_ mapping and ensure
that compression doesn't significantly change (so mapping is properly initialized),
then you slowly increase the mapping update rate.
5. Variable update rate does help. But mainly in higher order contexts.

16. OK, thanks for info!

17. If you want a more cache efficient implementation of some "narrowed" bit history (as i said in some post, increasing further than 2 bits doesn't improve compression that drastically), here's a nibble model (sized 64 bytes ):

Code:
```ctx4 - nibble context (like paq)

class model4
{
protected:
static const uint N=15*4;
uint32 states;
bitmodel bitTree[N];

public:

void Clear();

inline int P1( const uint ctx4 )
{ return bitTree[ctx4*4+(states>>2*ctx4)&3].P1(); }

inline void Update( const int bit, const uint ctx4 )
{
Assert( ctx4<15 );
Assert( 4*ctx4+state<N );
bitTree[4*ctx4+state].Update(bit, C, D);
state = 2*state+bit&3;
}
};

void model4::Clear()
{
states = 0;
for (uint i=0; i<N; ++i) {
bitTree[i].Set();
}
}```
With some clustering this might be practical for order1. I haven't checked it out.

EDIT: Adding a class which holds the state and a pointer to the active bit might increase the performance a bit.

18. hi encode,
Maybe I can give an idea about counters. I'm working on light-weight context mixing (nothing special, just mixing order 0-5 with neural network and 2 dimensional SSE in limited memory - eg. 70mb, no additional sub-model). I have two well used bit model.
1-p=pr+=((4095-pr)>>5) style
2-Semi-stationary
Their characteristics are different. In my SSE counters, I'm using first choice. I'm using second one in literal coding. This gives good results. But, I know, actually they are poor. We must upgrade newer things

19. Its actually completely wrong to just take some known counter sample and use it.
Counter is basically a standalone model for symbols in context.
And traditional counters are only good for almost-random bit sequences.
While looking at the actual data, you may find some patterns.
Eg. if there're contextual dependencies in the context history, then the right idea
probably would be adding these history bits to the context (or SSE context),
not optimizing the counter parameters or something like that.

20. Note that I already use something based on bit histories (like previous whole byte, high 3 bits of previous whole byte etc). But, when I changed bit model, compression becomes worser or better. This is the answer why I'm looking for new ideas about counters. I think, SSE needs completely different bit model for it's entries. Maybe different model for each entries (eg. different adaptation speed). But, I'm not sure.

21. Originally Posted by osmanturan
Note that I already use something based on bit histories (like previous whole byte, high 3 bits of previous whole byte etc).
That's not a bit history. A bit history is the bit sequence you encounter under a context after contextual seperation.

Originally Posted by osmanturan
But, when I changed bit model, compression becomes worser or better. This is the answer why I'm looking for new ideas about counters. I think, SSE needs completely different bit model for it's entries. Maybe different model for each entries (eg. different adaptation speed). But, I'm not sure.
I hope i could help with my last post on the old forum. There are a lot of good threads about couters/SSE. Just have a look at that.

22. > But, when I changed bit model, compression becomes worser or better.
> This is the answer why I'm looking for new ideas about counters.

Just think about it. Counter is a model for some subsequence of data bits.
So you can extract that sequence and make a separate model for it,
then you'd know the performance limit for a counter working on this sequence.

> I think, SSE needs completely different bit model for it's entries.
> Maybe different model for each entries (eg. different adaptation speed).

SSE has two basic applications:
1. Probability stabilization. Obviously, any context has less statistics
than whole data, so it may assign too small probabilities to some rare symbols.
But improvement from fixing that is usually insignificant.
Though it might become significant (like in paq) if we'd knowingly use
a faulty model for some reason.
2. Context clustering. There're usually too many relevant contexts, so
we cannot really collect the full dependencies. Instead, we can make
a few models with different context subsets and combine their predictions.
But there's this other way too: you can merge statistics of some contexts
if you know they're similar.
And the best way to measure the actual similarity of two contexts is
to compare their histories - contexts with similar histories make similar
predictions, so they can be merged.
Then, a probability is a function of a context history, so it can be
used as a context similarity measure, that's why SSE allows to expand
contexts and significantly improve compression.
But there's no reason to think that a better method doesn't exist,
eg. using directly the context histories.

23. Originally Posted by toffer
That's not a bit history. A bit history is the bit sequence you encounter under a context after contextual seperation.
What is the exact meaning of contextual seperation?

Originally Posted by toffer
I hope i could help with my last post on the old forum. There are a lot of good threads about couters/SSE. Just have a look at that.
I have read your post before. I have implemented my SSE implementation by your explanation. I think, your explanation is very very clear. Thanks a lot again.

I found SSE highly sensitive process (context, constant, bit model, prediction preprocessing etc turn SSE into very different characteristics). Whenever I changed anything compression becomes more better or more worser (for example, on valley.cmb varies about +-1 MB!)

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•