Results 1 to 30 of 45

Thread: Logistic mixing & modeling

Threaded View

Previous Post Previous Post   Next Post Next Post
  1. #1
    Programmer toffer's Avatar
    Join Date
    May 2008
    Erfurt, Germany
    Thanked 0 Times in 0 Posts

    Logistic mixing & modeling

    After thinking a bit about PAQ's mixer - which uses gradient descent to compute weights iteratively - i thought about applying that to bit models. Instead of having a counter with a state p (probability) in linear space one could have a logistic counter s=St(p) and update its state in logistic space. When using logistic mixing one doesn't need to convert the input probabilities p into logistic space - thus no stretch function needs to be implemented. Alltogether a logistic counter with gadient based update could look like:

    h = -ln(a+b*p)
    p = Sq(s)
    dh/ds = -(y-p)
    (derivation and terms like in this thread

    s' = s + L*(y-p) = s + L*(y-Sq(s))
    Sq(x) - Squash 1/(1+e^-x)

    That update can be derived like i did in the Mixing strategies thread - and actually equals a fixed input ("one") with a "weight" s. Hence it is just a weigthed bias neuron. BTW that update appears in bias neuron updates using logistic mixing - since it's the same. Nothing new actually.

    Now i noticed that we could actually "reinvent" the wheele: When using logistic mixing along with such a model - why not take the mixing functions influence into account, adjusting the bit models proportional to their contribution to the mixed output? Well integrating that into logistic mixing gives us a combined purely logistic mixing and modeling approach:

    (1) each model i has a state s_i in logistic space

    (2) the mixer assigns weights w_i to the models s_i

    (3) a mixed prediction p in linear space is

    p = Sq( sum w_i s_i )

    (4) the coding error e is

    e = y-p

    (4) weights are updated

    w_i' = w_i + L*e*s_i

    (5) predictions are updated

    s_i' = s_i + M*e*w_i

    0< L, M < 1 are learning rates. Note that -e*w_i is simply the partial derivative of h (coding cost) wrt to an input s_i. Alltogether this can be interpreted as a neural network: the input layer is made of bias neurons, with weights selected by a context. The output layer is a single neuron with linear activation and squash as the output function. The hidden layer just forwards the inputs (linear neurons). It learns using error backpropagation which minimizes coding cost rather than RMSE.

    I have no idea how well this will preform in practice - but i hope that it might work as well as logistic mixing. And most of this is already covered by logistic mixing. This is just an extension of modeling integration.

    I'd implement this when i've got some spare time.

    It might even be possible to get rather good results with *static* weights (no weight update) since the logistic model updates take the actual weight values into account.

    Last edited by toffer; 9th June 2009 at 14:12.

Similar Threads

  1. Simple bytewise context mixing demo
    By Shelwien in forum Data Compression
    Replies: 13
    Last Post: 3rd April 2020, 17:04
  2. Context mixing
    By Cyan in forum Data Compression
    Replies: 7
    Last Post: 4th December 2009, 19:12
  3. Mixing strategies
    By Alibert in forum Data Compression
    Replies: 38
    Last Post: 25th June 2009, 00:37
  4. Graphic/Image/Modeling Benchmark
    By Simon Berger in forum Data Compression
    Replies: 23
    Last Post: 24th May 2009, 18:50
  5. CMM fast context mixing compressor
    By toffer in forum Forum Archive
    Replies: 171
    Last Post: 24th April 2008, 14:57

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts