Results 1 to 5 of 5

Thread: About PAQ8 Models Training

  1. #1
    Member
    Join Date
    Oct 2016
    Location
    Argentina
    Posts
    1
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Question About PAQ8 Models Training

    Hi!

    Greetings from Argentina, University of Buenos Aires.

    I've been working on a research about data compression with PAQ8L code.

    I would like to stop PAQ models's training in a particular point, in this way, only context is updated in the next encoding.

    I've already stopped some of them but ContextMap Class is really confusing for me.

    There is a method called "mix1" that updates the model with "y" (the newer bit), I tried to change it but it's training anyway.

    Could you help me to stop ContextMap from learning lines?

    ContextMap Class code: http://pastebin.com/xGGb1hpm

    Thanks you!

  2. #2
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,348
    Thanks
    212
    Thanked 1,012 Times in 537 Posts
    > I would like to stop PAQ models's training in a particular point, in this way, only context is updated in the next encoding.

    Actually it gets hard to see where context stops and statistics start.
    Ok, bits in the file before the current bit are context, but what about context history? (Sequence of bits which occur in a specific primary context)
    And if that is ok too, then why counters are different - they're just a quantized version of context history?

    > ContextMap Class code

    I guess you can look at mix2() there - it calls StateMap::p, which updates the counter.

  3. Thanks:

    juanandreslaura (10th October 2016)

  4. #3
    Member
    Join Date
    Sep 2015
    Location
    Argentina
    Posts
    7
    Thanks
    4
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Shelwien View Post
    > I would like to stop PAQ models's training in a particular point, in this way, only context is updated in the next encoding.

    Actually it gets hard to see where context stops and statistics start.
    Ok, bits in the file before the current bit are context, but what about context history? (Sequence of bits which occur in a specific primary context)
    And if that is ok too, then why counters are different - they're just a quantized version of context history?

    > ContextMap Class code

    I guess you can look at mix2() there - it calls StateMap::p, which updates the counter.
    The main reason to stop model's training is to always get the same probability given the same context. If you don't stop model training, you'll get different probability when the same context is given.

    For example in Run Map:

    The state is (b,n) where b is the last bit seen (0 or 1) and n is the number of consecutive times this value was seen.
    The output is computed directly:

    t_i = (2b - 1)K log(n + 1)

    For Run Map model, you'll always get the same output for a given context.

    But there're other models, like DMC, that for a given context (or state) you could get different probabilities because of C0 and C1 updating. If you stop DMC training (i.e. you don't update C0 and C1) you will always get the same probabilitie for a given context.

    Suppose you compress a file (enwik8). Then, if you stop all paq's models training, you could use PAQ to predict a given sequence (for example 10110) by just compressing bits 1,0,1,1,0 in order to update contexts (I think..) and then you should be able to measure the probabilitie of that sequence given enwik8 distribution..
    If you don't stop trainning, you will update not only contexts but also probabilities and you'll get a different distribution than enwik8.

    Do you think it's possible?

    Thank you very much,

    Juan

  5. #4
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,348
    Thanks
    212
    Thanked 1,012 Times in 537 Posts
    > Do you think it's possible?

    As I said, the problem is to define what is "allowed" context, and what you want to stop adapting.
    Its clear enough in old PPM models, where we only have one type of context (prefix symbols) and a counter table/list per context.
    But its easy to make a model without such clear subdivision - for example, a static "order-1024" model, which would dynamically
    build a probability distribution from the context string, and then work as an adaptive order0 model - how would you stop it from adapting?
    Same question applies to paq8, because it not only tracks prefix contexts, but also all kinds of others, some of which are pretty close
    to statistics - context histories etc.
    So in theory, you'd want to make a static model, which can compute its prediction from just N context symbols (with N=16 or so), but
    that would require removing 90% of paq8 code, because it uses too much of secondary contexts.
    And otherwise, if you'd let the model access all the previously processed file data, then you won't be able to force it to make the same predictions.

  6. Thanks:

    juanandreslaura (10th October 2016)

  7. #5
    Member
    Join Date
    Sep 2015
    Location
    Argentina
    Posts
    7
    Thanks
    4
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Shelwien View Post
    > Do you think it's possible?

    As I said, the problem is to define what is "allowed" context, and what you want to stop adapting.
    Its clear enough in old PPM models, where we only have one type of context (prefix symbols) and a counter table/list per context.
    But its easy to make a model without such clear subdivision - for example, a static "order-1024" model, which would dynamically
    build a probability distribution from the context string, and then work as an adaptive order0 model - how would you stop it from adapting?
    Same question applies to paq8, because it not only tracks prefix contexts, but also all kinds of others, some of which are pretty close
    to statistics - context histories etc.
    So in theory, you'd want to make a static model, which can compute its prediction from just N context symbols (with N=16 or so), but
    that would require removing 90% of paq8 code, because it uses too much of secondary contexts.
    And otherwise, if you'd let the model access all the previously processed file data, then you won't be able to force it to make the same predictions.

    That's right. I think that it's very difficult to make a clear subdivision about what a context is. However, when PAQ compresses a bit, it automatically updates all of it's contexts. Is it right? In addition, the models use such contexts to compute the probabilitie of next bits. What's more, some of the models use not only the contexts but also the previous probabilitie to compute the new probabilite. Some of them then use the compressed bit to update the probabilitie to make a "better prediction" next time. Suppose you stop making that task (i.e. you stop updating probabilitie using the compressed bit) in all of the models (if possible)...By just making that, when PAQ compresses next bit, it will automatically update all of its contexts (and in consequence the probabilitie) but it will not learn anything about the compressed bit.

    Do you think that we can obtain this by just stopping prob updating using the compressed bit ?

    Thank you very much,

    Juan

Similar Threads

  1. A new algorithm for compression using order-n models
    By Christoph Diegelmann in forum Data Compression
    Replies: 18
    Last Post: 4th May 2016, 00:25
  2. Online Master Apache Hadoop Training
    By ronakvagrani in forum Data Compression
    Replies: 1
    Last Post: 28th May 2015, 18:22
  3. Investment Banking Training course
    By david96 in forum The Off-Topic Lounge
    Replies: 0
    Last Post: 15th May 2015, 15:41
  4. Natural Language Processing, PPMd, PPMZ and frozen models
    By patentsguy in forum Data Compression
    Replies: 7
    Last Post: 6th June 2012, 22:01
  5. Training for Voice Technology
    By ReebaMuller in forum The Off-Topic Lounge
    Replies: 0
    Last Post: 13th May 2009, 11:53

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •