Results 1 to 5 of 5

Thread: PPMd + TSE

  1. #1
    Member
    Join Date
    Apr 2011
    Location
    Kyiv, Ukraine
    Posts
    2
    Thanks
    0
    Thanked 0 Times in 0 Posts

    PPMd + TSE

    Recently I made small test on PPMd. I want to look on TSE in practical compression.

    So, I made unary encoding in masked contexts, and add probability adjustment by TSE. There are two stages of TSE, first probability is adjusted by TSE context, selected by previous encoded symbol, then by TSE context, selected by current stat symbol.

    Note, that using previous and current char simultaneously as context of probability adjustment is impossible in SSE (we will have at least 2^16 SSE contexts for each probability quant). Mixer like used in Paq can do such adjustment, but it will require 2 SSE maps with different contexts selected. So, TSE is a good solution, when adding some new reasonable flags in SSE leads only to compression degradation.

    This version is slower by 2-3 times than original PPMd_sh8, partially from unary encoding.
    But main goal of this version was not speed.

    Some results: No code has to be inserted here.
    Attached Files Attached Files

  2. #2
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,474
    Thanks
    26
    Thanked 121 Times in 95 Posts
    A good technical report on SSE, SEE, TSE, mixing etc would be very welcomed Maybe it's possible to produce strong coder that would be feasible to implement on GPGPU, ie an environment where computations are much cheaper in relation to memory accessess than on CPU.
    Last edited by Piotr Tarsa; 30th April 2011 at 04:55. Reason: Typo (?)

  3. #3
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    Maybe I don't quite understand. It looks like TSE takes a context (8 bits in proc2.inc) and a probability (freq/divizor in tse_cell.inc) and adjusts the probability using 2 stored variables, n1 and n2, initially 0 and some large scale factor. The adjustment is in the direction n1/(n1+n2). In this representation, n1 is signed and n2 > -n1. Then (according to your explanation) you make a chain of probability adjustments, first for order 1 without current bits, then order 0 with current bits.

    This looks like the same idea as an ISSE chain in ZPAQ mid.cfg or in paq9a but with different math and different contexts. Those models start with an indirect order 0 model (context -> bit history -> prediction) and make a chain of adjustments going to higher orders. Each adjustment has the form p := (w1 + p*w2) where p is the bit probability in the logistic domain log(p/(1-p)) and w1 and w2 are selected by the bit history. Then w1 and w2 are updated to reduce the prediction error. Initially w1 = 0, w2 = 1, which makes no adjustment to p.

    Is this the right idea? When you say unary coding, I guess you mean converting character frequencies to a series of binary decisions? I didn't look at all the code.

    Also, "TSE" means "? symbol estimation"?

  4. #4
    Member
    Join Date
    Apr 2011
    Location
    Kyiv, Ukraine
    Posts
    2
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Unary coding is exactly converting series of frequencies into series of binary decisions. First symbol in array is encoded symbol? No? Take next symbol and repeat. Yes? Ok, go for next input symbol.

    And yes, I make a chain of adjustments, first for order 1, then for symbol-candidate to encode. This symbol-candidate is a byte-oriented counterpart to "order0 with current bits" in bit-oriented compressors.

    TSE means "Third Symbol Estimation", and it was introduced by Dmitry Shkarin in this post: http://encode.su/threads/1040-SSE-TS...ty-estimations.
    My own implementation differs from original Dmitry's, because I want code to be more clear.

  5. #5
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,334
    Thanks
    209
    Thanked 1,007 Times in 532 Posts
    > A good technical report on SSE, SEE, TSE, mixing etc would be very welcomed

    Matt kinda has some descriptions of SSE and mixing in his book:
    http://www.mattmahoney.net/dc/dce.html#Section_432

    SEE is the same thing as SSE, when applied to escape symbols in PPM.

    As to TSE, this here is the first public demo, so there's
    no real proof that it makes any sense in practice.

    > Maybe it's possible to produce strong coder that would be feasible
    > to implement on GPGPU, ie an environment where computations are much
    > cheaper in relation to memory accessess than on CPU.

    Unfortunately its not that simple.
    All the components are based on various kinds of statistics,
    so they still do require memory access.
    Also sometimes there're branches, and memory accesses usually
    depend on previous calculations (both are bad for GPUs).
    But the main problem is that there's no general-purpose parallel
    compression algorithm.

Similar Threads

  1. PPMd reformatting
    By Shelwien in forum Data Compression
    Replies: 25
    Last Post: 16th April 2009, 06:06
  2. SSE, TSE & higher symbol probability estimations
    By Dmitry Shkarin in forum Forum Archive
    Replies: 4
    Last Post: 6th March 2008, 19:47

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •