From comments to

http://cbloomrants.blogspot.com/2010...ession-on.html

take CTW, the idea of working in binary and blending context models

by weighting rather than selecting

Its not the CTW idea at all, and its trivial.

CTW idea is to use the string probabilities for prediction, ie

logistic estimation (which appears because we can only handle string

probabilities in log domain). However, Matt's logistic mixing was

taken from neural networks, not CTW.

Basically, both bitwise processing and "blending" are obvious, when

you try to make a compressor based on NN (which is what Matt

originally did).

Also, (linear) mixing is natural for another reason too - if we have

two applicable models which give predictions p1 and p2, and a

probability w of p1 being right, then p=w*p1+(1-w)*p2 appears straight

from the definition.

As to logistic mixing, its less obvious, but I found that paq mixing

formulas can be actually derived from the same source model based on

submodel switching, simply by considering string probabilities.

take Malcolm's ideas from Rkive such as using non-contiguous

contexts like AxBx

Nobody remembers that anymore, but most still use sparse contexts

because its natural with a hashtable-based coder.

take SEE from PPMZ and the PPMii ideas of percolating updates to

neighbors to help with sparse statistics and slow learning

I agree with SEE, but PPMII's ideas are its initialization of new

counters and dynamic ranking of symbols in tree nodes and unary coding

and SSE. And as to "fuzzy" updates - its also one of these things

which everybody invents on their own.

Btw, ppmonstr also has sparse contexts.

you pretty much have modern CM

Still, I don't see it. Modern CMs store different counters into

different data structures (they have to be cache-friendly these days),

mix submodel predictions with a different mixing function and encode

the bits with a different arithmetic coder (Matt's rangecoder is

pretty unique, although I don't like it). So even if you can find some

similarities, they're likely false.

yes obviously there are lots of little tweaks, but so far as my

understanding goes there's nothing dramatic.

Depends on where you look. Both old and new coders take data on input

and write some function of it on output, and both are based on the

same idea of using statistics to predict the data. But implementations

look completely different to me. And in fact their behaviour is fairly

different too - old coders compressed texts considerably better -

considerably enough for ppmonstr still to keep the best result on 1G

text compression: http://mattmahoney.net/dc/text.html (durilca is

ppmonstr with attached text preprocessor)

the weighting in PAQ is very basic machine learning stuff so far as

a I can tell (simple linear online training),

It likely isn't that simple - that update formula was acquired by

minimizing the prediction entropy; originally Matt used a different

one, borrowed from NN theory, and it didn't work good enough.

modern ML has lots more tools to apply.

http://prize.hutter1.net/

http://mailcom.com/challenge/

Good luck :)