After thinking a bit about PAQ's mixer - which uses gradient descent to compute weights iteratively - i thought about applying that to bit models. Instead of having a counter with a state p (probability) in linear space one could have a logistic counter s=St(p) and update its state in logistic space. When using logistic mixing one doesn't need to convert the input probabilities p into logistic space - thus no stretch function needs to be implemented. Alltogether a logistic counter with gadient based update could look like:
h = -ln(a+b*p)
p = Sq(s)
dh/ds = -(y-p)
(derivation and terms like in this thread http://encode.su/forum/showthread.ph...ative#post7816)
s' = s + L*(y-p) = s + L*(y-Sq(s))
Sq(x) - Squash 1/(1+e^-x)
That update can be derived like i did in the Mixing strategies thread - and actually equals a fixed input ("one") with a "weight" s. Hence it is just a weigthed bias neuron. BTW that update appears in bias neuron updates using logistic mixing - since it's the same. Nothing new actually.
Now i noticed that we could actually "reinvent" the wheele: When using logistic mixing along with such a model - why not take the mixing functions influence into account, adjusting the bit models proportional to their contribution to the mixed output? Well integrating that into logistic mixing gives us a combined purely logistic mixing and modeling approach:
(1) each model i has a state s_i in logistic space
(2) the mixer assigns weights w_i to the models s_i
(3) a mixed prediction p in linear space is
p = Sq( sum w_i s_i )
(4) the coding error e is
e = y-p
(4) weights are updated
w_i' = w_i + L*e*s_i
(5) predictions are updated
s_i' = s_i + M*e*w_i
0< L, M < 1 are learning rates. Note that -e*w_i is simply the partial derivative of h (coding cost) wrt to an input s_i. Alltogether this can be interpreted as a neural network: the input layer is made of bias neurons, with weights selected by a context. The output layer is a single neuron with linear activation and squash as the output function. The hidden layer just forwards the inputs (linear neurons). It learns using error backpropagation which minimizes coding cost rather than RMSE.
I have no idea how well this will preform in practice - but i hope that it might work as well as logistic mixing. And most of this is already covered by logistic mixing. This is just an extension of modeling integration.
I'd implement this when i've got some spare time.
It might even be possible to get rather good results with *static* weights (no weight update) since the logistic model updates take the actual weight values into account.
Comments?![]()