Results 1 to 21 of 21

Thread: Poor compression of bit-version of PPM

Hybrid View

Previous Post Previous Post   Next Post Next Post
  1. #1
    Member
    Join Date
    Mar 2009
    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Poor compression of bit-version of PPM

    Hello,

    I've been experimenting with bit-version of PPM and results so far are
    very frustrating: book1 -> 254278 bytes. With byte version I got ~236K
    straightaway and it was first prototype with almost no tweaking. This
    is a fun-project so I've been reluctant to learn in depth what other
    people did but after quite a bit frustrations with bit-version of PPM I've
    decided to look around for some explanations why bit PPM version shows
    poorer compression. At this stage, I just compute entropy of model
    output so entropy encoding deficiency is not an issue. The explanation
    like: bit-version covers wider class of sources therefore it should be
    a weaker compressor for more specific type of data like ASCII texts is
    the general explanation that I find to be reasonable but still I'm
    missing the _specific_ properties of byte-PPM model what makes it more
    efficient with ASCII texts then bit-version of PPM. I would like to
    avoid to be exposed to any open (C/C++) sources thus the general/
    specific "verbal" explanation would be totally sufficient and truly
    appreciated.

    Stefan

  2. #2
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 795 Times in 488 Posts
    Isn't bit level unbounded order PPM the same as DMC? In PPM if you have any symbols with zero counts you code an escape symbol and fall back to the next lower context. With just 2 symbols you never code an escape so it is very simple. Hook gets good compression. http://cs.fit.edu/~mmahoney/compression/text.html#1778

    You will notice that the contexts are initialized to start on byte boundaries. When a new context is added, it is by appending the current bit (like bit level LZW), so all new contexts also start on byte boundaries.

  3. #3
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,775
    Thanks
    276
    Thanked 1,206 Times in 671 Posts
    1. PPM/CM models are not very practical, unless they show <210k
    on book1 - its better to use BWT otherwise.
    Also, a bitwise compressor generally has to use byte-aligned
    contexts for compression of texts (and other common data types),
    and with unaligned contexts its better to replace the test file
    with some jpeg or something else with unaligned bitcodes in it.
    And I'd presume that we're talking about real PPM - which tries
    to encode the symbols (bits or bytes, whatever) using statistics
    from a single context, without mixing etc. (Because with mixing
    its actually hard to get more than 216k).

    2. Bitwise PPM is actually more complex to implement than bytewise one,
    as its much more common with it to get a few contexts of higher order
    than necessary, with random statistics, and determining the optimal
    coding context is quite a problem. Imho the only practical implementation
    of something similar to bitwise PPM is DMC.
    http://en.wikipedia.org/wiki/Dynamic_Markov_Compression

    P.S. Seems that Matt was faster, but I'd post this anyway

  4. #4
    Member
    Join Date
    Mar 2009
    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts

    fuzzy contexts

    >Isn't bit level unbounded order PPM the same as DMC?

    Yes the contexts are not bounded. I've tried (long ago) the original Cormack DMC implementation and my recall is that compression was not impressive at all so I even did not consider it as a contender to start from. Bit version PPM has very simple trie organization and no need for escape probabilities makes it look as a very elegant and simple way to built efficient context model; may be it is somehow an invariant of DMC with similar poor compression though ;o(

    Did someone try to build probabilistic context in such manner that context is represented not by 0/1 but via some state in-between (let say [0...255]) according the 0/1 probability of node state? My model allows to do that, I've tried it and result was not better however plenty places where I could make some stupid mistake; it would be interesting to learn if someone could gets advantages from using of such fazzy contexts.

    >using statistics from a single context

    It is not single in sense that each new node of trie borrows up the statistic from shorter context and updates it accordingly thus, statistics from shorter contexts is accounted; however at the moment of 0/1 prediction the only current (longest context on trie) is used to asses P(0/1).

    >(Because with mixingits actually hard to get more than 216k)

    Quite encouraging ;o(

    Anyway, I just check the byte-aligned contexts you've advised and indeed the compression is getting better.

    Thanks,
    Stefan

  5. #5
    Member
    Join Date
    Nov 2008
    Location
    Germany
    Posts
    15
    Thanks
    0
    Thanked 0 Times in 0 Posts

    DMC vs. PPM

    Quote Originally Posted by Matt Mahoney View Post
    Isn't bit level unbounded order PPM the same as DMC?
    They are in fact very similar as shown by Bell and Moffat; on the other hand, Bunton has given a counter-example in her PhD theses showing that they are in fact different. The main idea is the following: you can group edges in DMC into two groups: "redirected edges" (edges going to cloned states) and "cloned edges" (edges coming from cloned states) . Now, if one starts with a trivial automaton in DMC and restricts "cloned edges" only, one indeed gets unbounded PPM whereas, if one (as usual) also allows redirection of "redirected edges" one gets something slightly different.

    To summarize, DMC is the more flexible model (e.g. allowing the use of arbitrary initial automata) on the one hand -- but, on the other hand, reality gives all its favors to PPM, since due to its simpler structure, it allows the application of far more optimization techniques at every level of the implementation (e.g. cutting of branches if free memory is exhausted, fast implementations using hashing, ...).

    Alibert

  6. #6
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts
    It is not fully optimized but with a simple BWTS using straight two state binary so that its a true binary compressor no byte alignment book1
    compress to 235,913 bytes.

    http://bijective.dogma.net/compresa.htm#iBBW

    Some day I will clean it up for better compression.
    But this at least gives another point of comparasion for you where a bit BWT
    coul be compared to a bit PPM

  7. #7
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,775
    Thanks
    276
    Thanked 1,206 Times in 671 Posts
    > Yes the contexts are not bounded.

    Then I'd advice to get it working with bounded contexts
    first. Actually I'm not aware of a PPM implementation
    which could properly handle unbounded contexts... their
    compression usually gets worse when you try to increase
    context order after some point (depending on file).

    Code:
    book1
    order ppmd_vJ vJ_sh5 ppmy_3c 3c+SSE
     4    216021  216065 216527  215707
     5    210767  210711 209697  209401
     6    209844  209597 207793  207654
    ppmd_vJ http://compression.ru/ds/ppmdj1.rar
    ppmd_vJ_sh5 http://ctxmodel.net/files/PPMd/ppmd_Jr1_sh5.rar
    ppmy_3c http://compression.ru/sh/ppmy_3c.rar
    ppmy_3c+SSE http://compression.ru/sh/ppmy_3c_sse.rar

    > I've tried (long ago) the original Cormack DMC
    > implementation and my recall is that compression was not
    > impressive at all so I even did not consider it as a
    > contender to start from.

    Yeah, but its old,,, and other implementations might be
    tuned for non-textual data...
    But there's not much choice, if you want a bitwise PPM:
    1. Using bitwise statistics and coding, but bytewise escapes
    2. Only add the "proven" contexts to the tree (DMC way).
    3. Keep all statistics but discard "unworthy" higher orders (LOE etc).
    Also it actually applies to bytewise PPM as well, but there
    its much more probable that the context with maximal order would
    also give the best predictions (for stationary data at least).

    > Bit version PPM has very simple trie organization

    Use hashes if you want it simple - its kinda natural with
    bitwise statistics and any tree is more complicated than
    a hash, especially when you'd try to implement something
    like a sliding window with it.

    > and no need for escape probabilities

    Escapes would still be useful for deterministic contexts
    (where only one bit value was encountered).

    > Did someone try to build probabilistic context in such
    > manner that context is represented not by 0/1 but via some
    > state in-between (let say [0...255]) according the 0/1
    > probability of node state?

    Here it sounds more like using the statistics as a context,
    than a fuzzy context, but if you mean that, then its common enough,
    known as SSE, APM, or a quantized history mapping.
    But that's actually different from my idea of a "fuzzy context",
    which imho should be either a quantized context (by coding with
    overlapped intervals, for example), or a mix of all contextual estimations
    with weights determined by similarity.

    > however at the moment of 0/1 prediction the only current
    > (longest context on trie) is used to asses P(0/1).

    Well, the worst case for such a model would be a dictionary file
    (like english.dic) - actually PPM and BWT have the same problem
    with it, but it would be even worse if you'd just use the maxorder
    context for a bit.
    I mean, words have a lot of common suffices, which PPM would use
    as a context for the next word... but statistics would be always
    completely wrong if its a dictionary.
    And though less frequently, but the same happens with plaintext
    as well - its quite common to get "accidental" high order contexts,
    which would provide random predictions, which is especially bad
    for PPM.

    >> (Because with mixing its actually hard to get more than 216k)
    > Quite encouraging ;o(

    Well, its a fact: last time I implemented a tree for bytewise
    statistics (plain numbers of context occurences) and when
    a dummy model was added (linear mix of contextual probability
    distributions, with fixed weights), and right away it was 215.7k or something.
    http://ctxmodel.net/files/tree_v1.rar
    Also, ppmy is not much smarter than that too.

  8. #8
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 795 Times in 488 Posts
    Some results for book1:

    dmc -> 237997
    hook 1.3 -> 232857

    However, hook uses LZP preprocessing.

  9. #9
    Member
    Join Date
    Mar 2009
    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts
    You are right, on book1 DMC shows 237997; actually quite good; I tested it a long ago on mixed content tar file and compared with RK (not sure) and DMC looked not impressive at all. Apparently I did a lousy job; thanks,
    Stefan

  10. #10
    Member
    Join Date
    Apr 2009
    Location
    The Netherlands
    Posts
    68
    Thanks
    3
    Thanked 15 Times in 8 Posts
    I picked up my experimenting again. The things I read here are also valid for the piece of algoritm I am constructing.
    Currently its doing ~208000 on book1 and 436000 on World95.txt so I'm happy with it.
    What I did to mix the order X 'hits' of the unbounded PPM together is this:
    sum = 0
    divider = 0

    for every order in current hits
    {
    e := getAverageErrorFor(order) // Average error of current hit, taking the bit-index, last byte and order in account. 0 >= e <= 0.5
    sum = sum + stretch(confidence(p1, e)) //p1 = more or less n1 / n
    divider = divider + (1 - (e * 2))
    }

    p = squash(sum / divider)

    output = APM(p) //Made myself an APM, not using PAQ version, but I call it APM here so everybody knows what it is.

    confidence = p1 adjusted, taking e in account. p1 = 0.5 when e = 0.5, p1 = p1 when e = 0

    Right now I am struggling for improvement. It feels that calculating e and defining the counter statistics is the place to be for improvements, but for 2 days now I am unable to improve things.
    If you count statistics, what do you use?

    PPMonstr supprised me. Book1 results in 202700 using PPMonstr. What's the trick there? Why is it so much better than mine?

    I know that my method is some sort of weighted average in the logistic space, but it works fine, because sadly my counter statistics on themself do not lead to usefull propabilities. Are there any other methods?

Similar Threads

  1. PPMX v0.05 - new PPM-based compressor
    By encode in forum Data Compression
    Replies: 49
    Last Post: 28th July 2010, 02:47
  2. BIT Archiver
    By osmanturan in forum Data Compression
    Replies: 137
    Last Post: 16th January 2009, 19:19
  3. PPMX - a new PPM encoder
    By encode in forum Data Compression
    Replies: 14
    Last Post: 30th November 2008, 16:03
  4. Bit Archive Format
    By osmanturan in forum Forum Archive
    Replies: 39
    Last Post: 28th December 2007, 23:57
  5. TURTLE incoming... Fast PPM file compressor.
    By Nania Francesco in forum Forum Archive
    Replies: 104
    Last Post: 8th August 2007, 20:40

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •