> Tree structure may get some speed gain - not extra compression.
Compression can change due to hash collisions.
In general, after switch from a hash to a tree,
the text compression would be slightly better,
and binary data compression gets somewhat worse.
> In my ppmx, with it's order skip using trees
> is questionable. So, will keep my hashing.
As I already said, when traditional prefix contexts
are used (even if selectively), its not a problem to
keep them linked with suffix pointers... and with
order tables (like in ppmy/ash/tree) nothing special
has to be done at all.
But with your kind of hashing, where a whole symbol
distribution is looked up by a context hash, its
just inefficient, but doesn't affect the compression,
I guess, so it may be reasonable to simplify the research.
> PPM potentially is less efficient on binary data - since
> it uses the highest order available only.
That's not the only problem, and probably isn't even the
main one. Also probability updates in frequency-based
models and bitwise models are absolutely different.
And while frequency distributions are more compatible with
stationary data, still it seems that bitwise updates are
much better balanced. Here's a good example:
Code:
book1 wcc386
436029 411381 // o0 alphabet model, sum-optimized
438861 401395 // o0 binary model, sum-optimized
350571 320331 // binary o1 only, sum optimized
345737 332711 // binary o1 only, book1-optimized
353437 319029 // binary o1 only, wcc386-optimized
345917 337759 // escaped o1, sum optimized
345345 340459 // escaped o1, book1-optimized
346935 337416 // escaped o1, wcc386-optimized
As we can see, the frequency-based model is better
for stationary data (book1) in all cases.
But on average, the bitwise update is clearly superior,
as it allows for a much faster and nonlinear adaptation.
How to deal with it is still a question though.
Its possible both to implement the bitwise update
along with bytewise coding and distributions, and
to improve the adaptivity using SSE or some other methods.
But people tend to avoid thinking about it and just
use bitwise models despite their known speed restrictions
(each bit has to be processed, one way or another), and
redundancy on stationary data.
> The answer is LOE. I will try this thing. As example,
> after coding byte "s" we will check each order for "s"
> probability - to find out what kind of model order was
> more accurate. After, we set this best order as initial -
> i.e. the higher orders will be just skipped.
I found that such statistics are useful for CM, but not
sure about PPM. Its kinda the reverse there - you shouldn't
update higher order contexts with "noisy" symbols
(which can be implemented as lazy/delayed updates).
Also, if you paid attention, you could see that order
selection in ppmd is highly affected by alphabet sizes.
Like, masked contexts with the same alphabet size as
unmasked one, are simply skipped without consideration.
> The "best order" estimation can be context sensitive.
Such techniques do work, but kinds of statistics which
are necessary to make them work, are also quite resource-intensive.
Like, you basically have to estimate a few kinds of
context sets per symbol, and accumulate the code length
(aka log-likelihood) for each case.
> Will dig into better counters implementation, not only
> SEE, LOE and proper models combination.
I also mentioned the "PPM mixed statistics problem".
In short, unmasked and masked statistics have different
properties, while practical PPM statistics are a random mix of both.
Like, a context can be updated after escape (as a masked context),
and then next time it would serve as unmasked. And then there're
partial updates etc. So in the end, nobody knows, what kind of
source is approximated by these submodels - there's even no
clear "context history".