> Tree structure may get some speed gain - not extra compression.

Compression can change due to hash collisions.

In general, after switch from a hash to a tree,

the text compression would be slightly better,

and binary data compression gets somewhat worse.

> In my ppmx, with it's order skip using trees

> is questionable. So, will keep my hashing.

As I already said, when traditional prefix contexts

are used (even if selectively), its not a problem to

keep them linked with suffix pointers... and with

order tables (like in ppmy/ash/tree) nothing special

has to be done at all.

But with your kind of hashing, where a whole symbol

distribution is looked up by a context hash, its

just inefficient, but doesn't affect the compression,

I guess, so it may be reasonable to simplify the research.

> PPM potentially is less efficient on binary data - since

> it uses the highest order available only.

That's not the only problem, and probably isn't even the

main one. Also probability updates in frequency-based

models and bitwise models are absolutely different.

And while frequency distributions are more compatible with

stationary data, still it seems that bitwise updates are

much better balanced. Here's a good example:

Code:

book1 wcc386
436029 411381 // o0 alphabet model, sum-optimized
438861 401395 // o0 binary model, sum-optimized
350571 320331 // binary o1 only, sum optimized
345737 332711 // binary o1 only, book1-optimized
353437 319029 // binary o1 only, wcc386-optimized
345917 337759 // escaped o1, sum optimized
345345 340459 // escaped o1, book1-optimized
346935 337416 // escaped o1, wcc386-optimized

As we can see, the frequency-based model is better

for stationary data (book1) in all cases.

But on average, the bitwise update is clearly superior,

as it allows for a much faster and nonlinear adaptation.

How to deal with it is still a question though.

Its possible both to implement the bitwise update

along with bytewise coding and distributions, and

to improve the adaptivity using SSE or some other methods.

But people tend to avoid thinking about it and just

use bitwise models despite their known speed restrictions

(each bit has to be processed, one way or another), and

redundancy on stationary data.

> The answer is LOE. I will try this thing. As example,

> after coding byte "s" we will check each order for "s"

> probability - to find out what kind of model order was

> more accurate. After, we set this best order as initial -

> i.e. the higher orders will be just skipped.

I found that such statistics are useful for CM, but not

sure about PPM. Its kinda the reverse there - you shouldn't

update higher order contexts with "noisy" symbols

(which can be implemented as lazy/delayed updates).

Also, if you paid attention, you could see that order

selection in ppmd is highly affected by alphabet sizes.

Like, masked contexts with the same alphabet size as

unmasked one, are simply skipped without consideration.

> The "best order" estimation can be context sensitive.

Such techniques do work, but kinds of statistics which

are necessary to make them work, are also quite resource-intensive.

Like, you basically have to estimate a few kinds of

context sets per symbol, and accumulate the code length

(aka log-likelihood) for each case.

> Will dig into better counters implementation, not only

> SEE, LOE and proper models combination.

I also mentioned the "PPM mixed statistics problem".

In short, unmasked and masked statistics have different

properties, while practical PPM statistics are a random mix of both.

Like, a context can be updated after escape (as a masked context),

and then next time it would serve as unmasked. And then there're

partial updates etc. So in the end, nobody knows, what kind of

source is approximated by these submodels - there's even no

clear "context history".