Results 1 to 7 of 7

Thread: Would it be SR or PPM?

  1. #1
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,474
    Thanks
    26
    Thanked 121 Times in 95 Posts

    Would it be SR or PPM?

    I have an idea for compression algorithm similar to both SR and PPM. Basically it would have multiple contexts of variable length (like in PPM) but in every context I would maintain MTF-like recency rank (possibly with addition of some short match history) instead of statistics, thus making it similar to SR.

    How to call that?

  2. #2
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,366
    Thanks
    212
    Thanked 1,018 Times in 540 Posts
    PPMd uses such ranks (especially ppmonstr) and is still considered PPM.
    I think symbol ranking is PPM without local stats (ie you keep symbol ranks, but not their statistics per context).

  3. #3
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,474
    Thanks
    26
    Thanked 121 Times in 95 Posts
    How does PPMd use recency ranks? I thought PPMd just encodes a flag (or a series of flags if symbol not found in high contexts) per symbol + possibly interval for specific symbol (if there's more than one unmasked symbol in current context).

    Matt's definitions are:
    PPM (Prediction by Partial Match): order-n, modeled in longest context matched, but dropping to lower orders for byte counts of 0.
    SR (Symbol Ranking): order-n, modeled by time since last seen.
    My idea is: order-n, modeled in longest context matched by time since last seen, but dropping to lower orders if not present at current order.

    Additionally I would want to include a history tag per each context. Each history tag would consist of up to 5 3-bit numbers, each number would be the rank of symbol matched (but number 111b = 7 would mean unseen symbol for example). For very rare contexts (up to 6 occurences) that would provide both symbol ranking information and full statistics. So it would be at least partially a PPM.

    Full idea is then: order-n, modeled in longest context matched by time since last seen and short history of matches, but dropping to lower orders if not present at current order.

  4. #4
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    I'd probably classify it as a new algorithm (or maybe a type of CM depending on the model details).

    I considered something similar when I wrote SR2 but I wanted to keep it fast. The idea was to have two queues, like orders 3 and 4, then emit a code indicating which queue and which position in the queue. It gets complicated because a character might be in both queues and you would want to give it a higher probability in that case. SR2 already keeps a count of the number of consecutive occurrences of the character at the front for modeling purposes.

  5. #5
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,366
    Thanks
    212
    Thanked 1,018 Times in 540 Posts
    for ppmonstr its kinda like this:
    Code:
    // c = input byte
    sum = total;
    for( i=0; i<N; i++ ) {
       p = rank[i].freq*SCALE/sum;
       flag = (c==rank[i].symbol);
       rc.Encode( flag, SSE(p) );
       if( flag ) break;
       sum -= rank[i].freq;
    }
    if( flag==0 ) // process the lower order
    My opinion is that
    1. CM is a model which explicitly mixes predictions from different contexts -
    implicit mixing like escapes + masking doesn't count (and gives different results anyway).
    2. PPM is a model which tries to encode a symbol in one context, then goes to next context if it failed.
    3. SR is kinda like PPM, but doesn't keep symbol statistics per each context.

  6. #6
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    So ppmonstr models by frequency but uses a MTF queue to speed up linear search? Or does it also model by rank order too (implicit in SSE maybe)?

  7. #7
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,366
    Thanks
    212
    Thanked 1,018 Times in 540 Posts
    1. Afair ppmonstr uses the same ranking by frequency as ppmd.
    (Its "afair" because I don't remember whether its a completely strict sorting, or
    some kind of approximation)
    2. Aside from rank==0, I don't see the rank itself used anywhere in probability estimation,
    but I don't think that it'd become "less PPM" if I add the rank to SSE context.

Similar Threads

  1. PPM o4 vs ST4
    By Gribok in forum Data Compression
    Replies: 17
    Last Post: 30th July 2010, 21:39
  2. PPMX v0.05 - new PPM-based compressor
    By encode in forum Data Compression
    Replies: 49
    Last Post: 28th July 2010, 03:47
  3. PPM with sparse contexts
    By encode in forum Data Compression
    Replies: 5
    Last Post: 9th July 2010, 03:37
  4. Poor compression of bit-version of PPM
    By Stefan in forum Data Compression
    Replies: 20
    Last Post: 16th March 2010, 17:58
  5. PPMX - a new PPM encoder
    By encode in forum Data Compression
    Replies: 14
    Last Post: 30th November 2008, 17:03

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •