Results 1 to 20 of 20

Thread: Need some advice for order-N CM

  1. #1
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    286
    Thanks
    9
    Thanked 33 Times in 21 Posts

    Need some advice for order-N CM

    I'm currently testing various things to catch up 10years of development
    As Shelwien was so nice to help me a lot, I first focused on a good order-N mix.

    It's just a cascaded logistic-mix p=mix(p8,mix(p7,mix(p6,...) and some low order mix to avoid doing escapes. Finally there's also some simple SSE.

    The stats are formed as follows. No context-history modelling is done.
    Order0 to Order2 are direct, Order3 to N are each hashed with a obscure function to a different 2^22 sized table. No collision detection is done.

    Order0 uses counters minimizing square error, Order1 and up are using a slightly modified non-stationary counter from Matt.

    I tried to keep it simple and clean so the main part is about 100 lines of code.

    Some results on calgary.tar:

    Code:
    751,734 - PAQ1 (32mb Order8)
    772.255 - Shelwiens o6mix59 (Order6) 
    776.649 - mine (100mb Order8 without SSE)
    758.462 - with SSE
    742.852 - ~250mb with SSE
    Orders 7 and 8 help only a miniscule and the difference between o6mix59 and mine are imo mainly because of different counters and the former is better optimized.

    So there is not much left to catch up with PAQ1
    I looked at the source and PAQ1 is using only one hash so there might be some context clustering which helps here. I don't think doing exact statistics would be better.

    What is a good hashing function? Should the current partly coded byte be included in the hash or is something like (hash<<8+byte) better?

    Looking forward to suggestions

    Regards
    Last edited by Sebastian; 16th November 2010 at 10:31.

  2. #2
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,366
    Thanks
    212
    Thanked 1,018 Times in 540 Posts
    > It's just a cascaded logistic-mix p=mix(p8,mix(p7,mix(p6,...)

    I suspect its not mix2? I mean, how many weights per mixer?
    Also, its good to select mixers by contexts too... up to storing
    a mixer in each context.

    > and some low order mix to avoid doing escapes.

    I don't quite get this... Why escapes when you're mixing all the contexts?

    > 751,734 - PAQ1 (32mb Order
    > 772.255 - Shelwiens o6mix59 (Order6)
    > 776.649 - mine (100mb Order8 without SSE)
    > 758.462 - with SSE
    > 742.852 - ~250mb with SSE

    I'd add:
    745,196 - http://ctxmodel.net/files/mix_test/mix_test_vC.rar
    775,435 - o5 http://ctxmodel.net/files/PPMd/ppmd_Jr1_sh8.rar
    764,865 - o6
    760,622 - o7
    757,421 - o8
    755,765 - o9
    752,770 - o16
    694,825 - ash04a* /s255 // http://ctxmodel.net/files/ASH/
    668,455 - ppmonstr vJ -o16 // http://www.compression.ru/ds/ppmdj1.rar
    * ash 05-07 are memory usage optimizations, their compression is worse

    > Orders 7 and 8 help only a miniscule

    Actually they should help. At least there's pic where you
    can easily find very long matches.

    > and the difference between o6mix59 and mine are imo mainly because
    > of different counters and the former is better optimized.

    Maybe, so it makes sense to pay attention to component optimization first,
    before building something complex.
    Normally, logistic mixing should be able to easily beat that kind of model.

    > So there is not much left to catch up with PAQ1

    Actually paqs are imho not the best target to overcome for plain order-N CM.
    There're text-specific models and even some calgary-corpus-specific stuff (ie pic model).

    > I don't think doing exact statistics would be better.

    Depends on data type and the model. But afaik its generally helpful for text to know
    exactly what kind of symbols did previously appear in current context.

    > What is a good hashing function?

    Ones with single multiplication seem good enough.

    > Should the current partly coded byte be included in the hash or is
    > something like (hash<<8+byte) better?

    Depends on specific structure. hash+byte or hash^byte are more common i think,
    although sometimes its more like hash+f(byte) for cache lookup optimization.
    Anyway, hash function calculation per bit per order would be too slow.

    Also, I'm much more interested to see some further experiments with trees -
    especially on combining modern counters (linear or fsm) with trees.
    With hashtables you never know what your model actually does.

  3. #3
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    286
    Thanks
    9
    Thanked 33 Times in 21 Posts
    Quote Originally Posted by Shelwien View Post
    I suspect its not mix2? I mean, how many weights per mixer?
    It's the mixer I posted here somewhere, so mix2 with two weights.
    They get selected by order1 and 2 hashs.

    I don't quite get this... Why escapes when you're mixing all the contexts?
    With escapes, I mean in this context something like checking for available stats at high order.

    I'd add:
    mix_test_vC is good so thanks for that. What is the main difference between this and o6mix?
    How do you build the hashes? You seem to do some special masking depending on the partially coded byte?
    The other programs don't really count because they are doing more heavier stuff, through it's nice how simple it is to beat PPMD.

    Actually they should help. At least there's pic where you
    can easily find very long matches.
    I didn't checked carefully but it was a gain in the range of 2-3k on calgary.tar.

    Actually paqs are imho not the best target to overcome for plain order-N CM.
    There're text-specific models and even some calgary-corpus-specific stuff (ie pic model).
    You forgot that I selected PAQ1 with only the NonStationaryModel selected.
    The result was copied from Matts page.
    So we have a non-adaptive weight plain order8 with PAQ1 resulting in 751,734 and your heavy optimized adaptive logistic mix with 745,196. See what I mean?

    Depends on data type and the model. But afaik its generally helpful for text to know
    exactly what kind of symbols did previously appear in current context.
    Sure. For example if I replace the counters for order1 and up with occurence counters I easily get 205xxx on book1 and now it's about 216.xxx, but that's not the catch here.
    Hashing causes implicit context quantization and at this stage of development I wanted it do be simple.

    But thanks for the advices.

    edit: maybe it's PAQs hard coded state-model that does all the difference, because doing context->history->prob is generally better than context->prob as you stated.
    Last edited by Sebastian; 16th November 2010 at 12:42.

  4. #4
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,366
    Thanks
    212
    Thanked 1,018 Times in 540 Posts
    > It's the mixer I posted here somewhere, so mix2 with two weights.

    Well, I don't know about this specific case, but some of mixers I've
    seen are not directly comparable with my mix2 (linear in Mix, logistic in mix_test).
    Like, sometimes there's an implied static prediction (so it has to be compared to mix2(mix2(p1,p2),0.5)).
    I mean, in case of my coders there're known ways to improve the results at the cost of speed.

    > They get selected by order1 and 2 hashs.

    Afaik its not enough for text.

    > mix_test_vC is good so thanks for that. What is the main difference between this and o6mix?

    mix_test = logistic mixing, o6mix = linear.
    Also there're hashtables in o6mix, but only plain counter tables in mix_test.

    > How do you build the hashes? You seem to do some special masking depending on the partially coded byte?

    There're no real hashes, just bits extracted from previous bytes.
    One common example would be 0x5F masks for english text - 6 bits of context
    instead of 8, but can actually improve compression because it discards the letter case
    ('A'=0x41, 'a'=0x61).
    And as to how... Its the output of the same parameter optimizer which I use for everything,
    it works with descriptions like this:
    Code:
    Index g1x
     mask8: x7,   &00000000
     mask7: x6,   &00000100
     mask6: x5,   &00001111
     mask5: x4,   &00011101
     mask4: x3,   &00010100
     mask3: x2,   &00000111
     mask2: x1,   &00001100
     mask1: x0,   &00001111
     mask0: ctx, b&10100110
    
    Rate1   g1wr,  16!00000000100000
    Number  g1mw, 1,1!000001111111110
    and runs the coder with various profiles until I get tired :)

    > The other programs don't really count because they are doing more
    > heavier stuff, through it's nice how simple it is to beat PPMD.

    Considering only modelling, they're much simpler actually.
    More than half of ppmd code is related to suffix tree handling,
    and that tree only stores simple 5-bit occurrence counts, which
    is nothing compared to paq counters.
    Also ppmonstr is basically ppmd+SSE, not something conceptually different.

    > You forgot that I selected PAQ1 with only the NonStationaryModel selected.

    More like I didn't know :)

    > So we have a non-adaptive weight plain order8 with PAQ1 resulting in
    > 751,734 and your heavy optimized adaptive logistic mix with 745,196.
    > See what I mean?

    Not really. Mix_test with real hashtables and SSE would do much better.

    > Sure. For example if I replace the counters for order1 and up with
    > occurence counters I easily get 205xxx on book1 and now it's about
    > 216.xxx, but that's not the catch here.

    I think its better to use plaintext (like book1) to tune the plain context models.
    I mean, when you improve compression (or in fact reduce the losses on mispredictions)
    of order8 CM on files like pic or geo, it doesn't mean anything, because you can't correctly
    predict bits in a picture or array of floats with prefix contexts by definition.

    > edit: maybe it's PAQs hard coded state-model that does all the
    > difference, because doing context->history->prob is generally better
    > than context->prob as you stated.

    A counter comparable to paq's fsm can look like this
    n0 = n0*(1-wr) + (1-y)
    n1 = n1*(1-wr) + y
    But unfortunately this is only easy to implement with floats,
    and even if we won't care about speed, its hard to ignore the
    8x difference in memory usage.

    I mean that a properly optimized state machine counter can do
    much better than simple approximations like in paq or ccm.

    Also SSE is certainly very important... especially real SSE
    like in ppmonstr, where its applied to symbol probabilities
    instead of bit probabilities (SSE = Secondary Symbol Estimation,
    the next step after SEE = Secondary Escape Estimation in PPM).
    (I mean, your context->history->prob has better to be adaptive)

  5. #5
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    Here are some results using a cascaded model in zpaq on calgary.tar for orders 0 through 8.

    Code:
    0 1717062
    1 1274266
    2 999888
    3 834827
    4 765089
    5 742790
    6 736141
    7 735014
    8 735500
    Order 8 model looks like this:
    Code:
    comp 4 4 0 0 9 (hh hm ph pm n)
      0 icm 5
      1 isse 13 0
      2 isse 18 1
      3 isse 19 2
      4 isse 20 3
      5 isse 20 4
      6 isse 20 5
      7 isse 20 6
      8 isse 20 7
    hcomp
      c++ *c=a b=c a=0 (save in rotating buffer M)
      d= 1 hash *d=a   (orders 1...8 for isse)
      b-- d++ hash *d=a
      b-- d++ hash *d=a
      b-- d++ hash *d=a
      b-- d++ hash *d=a
      b-- d++ hash *d=a
      b-- d++ hash *d=a
      b-- d++ hash *d=a
      halt
    post
      0
    end
    The first component is an order 0 ICM. It maps a context (past bits of the current byte) to a bit history (8 bit state) to a prediction.

    The remaining components are order 1 through 8 ISSE. Each takes the previous prediction and a hashed context and outputs a new prediction. The context is mapped to a bit history. The bit history selects 2 weights for a logistic mixer. One mixer input is the previous prediction. The other is a fixed constant. The effect is to learn mixing function weights like p' = w1*p + w2 for each bit history where p and p' are stretched. Initially w1=1, w2=0. The last ISSE output from the highest order model is arithmetic coded.

    The first argument after each ICM or ISSE is hash table size. Memory is 2^(n+6) bytes (20 = 64 MB). Each bit history takes 1 byte. The hash function is h = (h+c+512)*773 for each byte c of context. The exact spec is http://mattmahoney.net/dc/zpaq1.pdf

  6. #6
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    Also, I get very similar results using a single logistic mixer of all models rather than a cascade. For orders 0 through 8 on calgary.tar.

    Code:
    0 1717846
    1 1274730
    2 1001544
    3 840481
    4 772245
    5 749495
    6 741387
    7 737984
    8 735858
    Here is the model.
    Code:
    comp 4 4 0 0 10 (hh hm ph pm n)
      0 icm 5
      1 icm 13
      2 icm 18
      3 icm 19
      4 icm 20
      5 icm 20
      6 icm 20
      7 icm 20
      8 icm 20
      9 mix 0 0 9 32 0
    hcomp
      c++ *c=a b=c a=0 (save in rotating buffer M)
      d= 1 hash *d=a 
      b-- d++ hash *d=a
      b-- d++ hash *d=a
      b-- d++ hash *d=a
      b-- d++ hash *d=a
      b-- d++ hash *d=a
      b-- d++ hash *d=a
      b-- d++ hash *d=a
      halt
    post
      0
    end
    Context computation is unchanged. The mixer takes no context. The argument 32 is the learning rate. You can possibly change both of these to improve compression a bit.

    Changing ICM back to ISSE like in my last post, but keeping the final MIX improves compression to 719656. This does a cascaded mix, then mixes everything again.

    Changing the mixer to order 0 improves compression to 710464. (change to "mix 8 0 9 32 255". The 8 says to use 8 bit context, masked by 255 (0xFF). 0 9 says to mix 9 components starting at 0.).

  7. #7
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    286
    Thanks
    9
    Thanked 33 Times in 21 Posts
    Thanks Matt for your results, which are much appreciated.
    I have to try the idea with the final mix after the cascaded mix.
    Also I looked into PAQ9, and it seems your 'ISSE' is just a cascaded mix here with the actual? context-state as mixer-context.

    Code:
    for (int i=1; i<N; ++i) {
          sp[i]=&cp[i][i<4?c0:nibble];
          int st=*sp[i];
          pr=m[i-1].pp(pr, stretch(sm[i].p(st)), st+r)*3+pr>>2;
        }
    That is nearly the same I'm doing, but unfortunately the final static mix never works for me,
    but perhaps that relies on better weight initialization. Also I often saw something like
    p=(3*p_sse+p)/4 but either I have to weight them dynamically or do it the other way round.

    I fixed some bugs and improved hashing which results in 732.554 using 250mb,
    but I do a hell of stuff. After every counter-lookup there's a small SSE stage.
    Then a cascaded mix with mixers selected by order1 (which is bad, but quantised prob works not very good) + number of order. Then there's a final SSE-Stage.

    Also I tried to understand the naming shemes better but in an abstract way there are many similarities which confuses me.

    SSE=function of prob. dependend on context, so something like SSE[context].p[quant(p)]
    SSE2D=function of two prob., SSE[context].p[quant(p1)][quant(p2)]
    ISSE=MIX2 with special context (here a history state)
    bit history=a state of n0,n1 counts in PAQ but a prob. is also a quantised form of bit history

    So I tried to do simple ICM in my LZ-Flag model, which is also part of this project. First I count the occurences of '0' and '1' dependend on some context (last byte + part of model-history).
    The counts are mapped on another counter state[n1][n0] which probability is used for actual prediction. After encoding the two counters (occurence n_y+=1, state is adjusted to reduce prediction error) are updated. Also state[n1][n0] is initialized with (n1+1)/(n0+n1+2)

    It works better than a counter alone, but why does this works better than using the actual prob. p=(n1+1)/(n0+n1+2) of the occurence counter for the state so taking state[p] instead? Maybe because of limited precision of my probabilities (12-bit) instead of the counters which are each 8-bit wide. So it looks like another function of probability too, as SSE.
    Last edited by Sebastian; 21st November 2010 at 03:22.

  8. #8
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    paq9a uses a ICM-ISSE cascade like the one I posted here, but also LZP preprocessing to remove long matches. In the zpaq mid.cfg model, I add a MATCH component instead of LZP to take care of long matches. In either case, the final mixer is adaptive. It is also possible to add SSE stages after the mixer. In this case you can do static or adaptive mixing of the SSE inputs and outputs. In many PAQ versions the mixing is static with 3/4 weight to the output. In zpaq max.cfg, the mixing is adaptive using a MIX2. A MIX2 is a 2 input mixer where the weights are constrained to add to 1. But sometimes this does not give as good compression as a 2 input MIX with unconstrained weights.

  9. #9
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    286
    Thanks
    9
    Thanked 33 Times in 21 Posts
    Thanks for the details, not so much different as I see. I will add LZP and do SEE, maybe that will do any good here. Problem is the exclusion of the non-matched symbol. How to do this in bitwise CM?

    Another problem is that you have not really much data to use, as in bytewise where you can use much more informations from context stats for secondary estimations.

    I've removed the small SSE-stages again, which results in 735.936 on calgary.
    I really have to tune the different mixer and sse-context. I don't really like hashing and will do a tree instead.

    Has somebody done discriminant-analysis on contexts yet? There's a nice paper (ECECOW) which applies this on image compression for context-quantization, but with nearly no details. But results where astonishing.

    One strange thing, which maybe Shelwien can answer. Order0 and Order1 use some type of linear counter p'=(1-c)*p + c*y. But they are tuned in a way that they expand! the data if I use them alone. In a cascaded mix they improve compression on calgary by more than 15k over the parameters which would make more sense.

    None the less some results on enwik8 and sfc.

    Code:
    11.485.315 (sfc, 250mb)
    11.401.972 (sfc, 830mb)
    11.899.781 (my lz77-codec with 32mb window)
    
    21.730.146 (enwik8, 250mb)
    21.037.098 (enwik8, 830mb)
    Last edited by Sebastian; 23rd November 2010 at 02:14.

  10. #10
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    paq9a uses LZP exclusion by testing whether the leading bits of the first byte after a match equal the leading bits of the byte being modeled, and include this bit of knowledge in the context. It can do this because LZP matches are encoded as 1 bit per byte and literals as 9 bits. This is not perfect and only works for a high order context (order 12 or more). This also doesn't work for zpaq because it uses a bytewise context model. To do LZP you need to use escape bytes to encode lengths (like min.cfg). It does not use exclusion but it could be improved to do so using a trick like in 7zip where the mispredicted byte is XORed with the LZP prediction. As it is, LZP in min.cfg is strictly a speed optimization with a size tradeoff. You get better compression using a MATCH component instead.

  11. #11
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Sebastian View Post
    One strange thing, which maybe Shelwien can answer. Order0 and Order1 use some type of linear counter p'=(1-c)*p + c*y. But they are tuned in a way that they expand! the data if I use them alone. In a cascaded mix they improve compression on calgary by more than 15k over the parameters which would make more sense.
    It's linear counter which we use frequently (but static mix component is missing, ie. p'=(1-w)*p + w*(y*p_s + (1-y)*(1-p_s)) ). As to expansion, the first thing comes my mind is probability decision - is it 1 or 0 ? The other thing could be precision problem (maybe you leave a probability in lower or higher range than required). As to mixing, sometimes it does not surprise me. Because, adaptive mixers (linear or logistic, does not matter) are really good to adapt even really bad models. Also, adding mixer context is another boost for adaptive mixers.
    BIT Archiver homepage: www.osmanturan.com

  12. #12
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    286
    Thanks
    9
    Thanked 33 Times in 21 Posts
    Sadly I lost everything due to a HD-crash

    I don't know If you understood me correctly osmanturan. The counters and everything are working as they should, but they are working even better in a cascade If I tune them in a strange way. The static mixing part was - there but currently not used, and context for mixers are frequently used.

  13. #13
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,366
    Thanks
    212
    Thanked 1,018 Times in 540 Posts
    > I will add LZP

    LZP or match model?
    Afaik, LZP encodes match length and match model produces
    a symbol match flag, which can be directly encoded or used
    in SSE context etc.

    > Problem is the exclusion of the non-matched symbol. How to do this in bitwise CM?

    To mask out a single symbol, its enough to skip bit0 coding if higher bits match
    (ie if( byte(ctx*2)==(pc&~1) ) bit=(pc&1)^1 ) and that does actually improve compression,
    but to do it precisely, you need to compute the probability distribution for all 256
    byte values, then zero the masked symbols and normalize the result.
    Obviously its too slow, but might make sense to test it just to measure the redundancy
    in another workaround.

    That's why unary coding (like in ppmd/ppmonstr) is so attractive.
    Ie you assign codes to symbols, like 1, 01, 001, etc - in order of
    descending probability (or something like mtf rank), and you can
    apply any transformations to symbol probabilities (in fact this is
    where SSE was derived from SEE), and its really easy to skip masked
    symbols.

    There's an issue though - this doesn't work right for binary data,
    as unary coding takes too much time for large alphabets.
    Bytewise rangecoding partially compensates that (instead of rc call
    per unary flag with renormalizations, the symbol's interval is
    found without renorm, and only then rc is called), but even plain
    summing of 100+ freqs already takes too much time.

    So imho the best would be some hybrid solution with unary coding for
    a few high ranked symbols and binary coding after that (ideally based
    on a static huffman tree or something).

    > Another problem is that you have not really much data to use, as in
    > bytewise where you can use much more informations from context stats
    > for secondary estimations.

    Actually you have most of contexts from bytewise coders anyway,
    for example this is from Ash:
    Code:
        max_mps = ( MaxOrder>0 ) ? (**CL[MaxOrder-1]).S[0].c : -1;
    
        t2.f2 = (sseC>=0x41) 
              + ((sseC>=0x41) && (prevchar>=0x41))
              + ((max_mps>=0x41) && (prevchar>=0x41));
    
        t2.f3 = ((CurOrder>0 ) ? ( (**CL[CurOrder-1]).S[0].c == maxS ) : 0);
        t2.f4 = ((CurOrder>1 ) ? ( (**CL[CurOrder-2]).S[0].c == maxS ) : 0);
        t2.f5 = ((CurOrder>2 ) ? ( (**CL[CurOrder-3]).S[0].c == maxS ) : 0)
          +     ((CurOrder>3 ) ? ( (**CL[CurOrder-4]).S[0].c == maxS ) : 0)
          +     ((CurOrder>4 ) ? ( (**CL[CurOrder-5]).S[0].c == maxS ) : 1)
          +     ((CurOrder>5 && Rank==0 ) ? ( (**CL[CurOrder-6]).S[0].c == maxS ) : 1);
    
        t2.f6 = (Rank==0) ? sse_prevOk&1 : 0;
        t2.f7 = (Rank==0 && CurOrder>0) ? ( (**CL[CurOrder-1]).S[0].P==1.0 ) : 0;
    Here
    sseC = code of symbol for which we refine the probability of unary flag with SSE
    Rank = rank of that same symbol (in order of unary coding, across contexts)
    MaxOrder = order of longest current context
    CurOrder = order of context of currently processed symbol
    maxS = current rank0 symbol
    sse_prevOk = previously encoded flag in this SSE instance

    > I don't really like hashing and will do a tree instead.

    I agree here, but please make a good tree, at least not worse than one in ppmd
    I mean, with variable-length nodes and custom allocation.

    > Has somebody done discriminant-analysis on contexts yet?
    > There's a nice paper (ECECOW) which applies this on image
    > compression for context-quantization, but with nearly no details.
    > But results where astonishing.

    Note that in file compression the contexts are discrete,
    so its much harder to do. But as to context clustering,
    SSE is the most useful method, and there're others
    (including bitmasks in mixtest_vC).
    Well, unfortunately there's not much to do in runtime,
    but obviously we do static context quantization - the
    most advanced open-source example is probably bsc, with
    its bruteforce context clustering by entropy estimations
    (see qlfc thread).

    > One strange thing, which maybe Shelwien can answer. Order0 and
    > Order1 use some type of linear counter p'=(1-c)*p + c*y. But they
    > are tuned in a way that they expand! the data if I use them alone.
    > In a cascaded mix they improve compression on calgary by more than
    > 15k over the parameters which would make more sense.

    Suppose we have these predictions:
    p1: --^^---^^^^^^--^^^^----
    p2: ^^--^^^^--^-^^^----^^^^
    "-" means bad prediction (redundant) and "^" means good.
    Afaik this is what you're talking about.

    In other words, we have two different types of data and two submodels
    tuned for these, but we don't know how to detect the data type switches,
    so we use an adaptive mixer instead.

    Well, that's the theory anyway, but in practice I just start with
    separately tuned contexts and then tweak all the model's parameters
    until results stop improving.

  14. #14
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    286
    Thanks
    9
    Thanked 33 Times in 21 Posts
    Thanks again Shelwien.

    Ok then I'll use a match model or PPMDet as Charles Bloom called it.
    Search for the longest (or a good one) matching context and encode a flag with SEE if the following byte matches. I have a good variant laying around which improved my PPM coder quite a bit.

    Can you please tell me what SSEprevOk means? Is this the last bit in the SSE-Context?

    Ok, now I'll stop asking questions because I'll need 2 weeks

  15. #15
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,366
    Thanks
    212
    Thanked 1,018 Times in 540 Posts
    > Can you please tell me what SSEprevOk means? Is this the last bit in the SSE-Context?

    Bit history actually, but only one bit is used there

  16. #16
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Sebastian View Post
    Sadly I lost everything due to a HD-crash
    That's a bad news
    Quote Originally Posted by Sebastian View Post
    I don't know If you understood me correctly osmanturan. The counters and everything are working as they should, but they are working even better in a cascade If I tune them in a strange way. The static mixing part was - there but currently not used, and context for mixers are frequently used.
    Well it's ok.
    BIT Archiver homepage: www.osmanturan.com

  17. #17
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    286
    Thanks
    9
    Thanked 33 Times in 21 Posts
    To the unary coding. I seem to not understand the advantage of coding a symbol as successive runs of escape symbols.

    I could think of using a MTF-rank of the previous symbol in some context, but that's not what you're talking about.

  18. #18
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,366
    Thanks
    212
    Thanked 1,018 Times in 540 Posts
    > To the unary coding. I seem to not understand the advantage of
    > coding a symbol as successive runs of escape symbols.

    It improves modelling precision (significantly).
    1. Most data sources work with byte alphabets, so modelling bit prefixes
    of symbol codes is inherently wrong.
    2. You can use different contexts and parameters for different ranks.
    3. In many cases even plain symbol ranking + unary coding would compress
    the data, so you'd have less rc calls than with fixed 8 bits per symbol.

    > I could think of using a MTF-rank of the previous symbol in some context,
    > but that's not what you're talking about.

    Its also related to tree usage for context statistics.
    The most compact tree node representation is a table of branches,
    with element per symbol, and you have to scan it to find the current symbol,
    which is very much compatible with unary coding.

  19. #19
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    286
    Thanks
    9
    Thanked 33 Times in 21 Posts
    What do you think about this.

    It extends the idea from Bloom (PPMDet) for every order but with imo poor results, maybe because of the lack of secondary estimations.
    Last edited by Sebastian; 23rd November 2010 at 19:48.

  20. #20
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,366
    Thanks
    212
    Thanked 1,018 Times in 540 Posts
    If you mean symbol ranking, sure its an obvious idea
    http://mattmahoney.net/dc/text.html#2530
    But unfortunately there's BWT

Similar Threads

  1. Advice in data compression
    By Chuckie in forum Data Compression
    Replies: 29
    Last Post: 26th March 2010, 16:09
  2. M01 - Order 01 context mixer
    By toffer in forum Data Compression
    Replies: 58
    Last Post: 17th June 2008, 19:29
  3. fcm1 - open source order-1 cm encoder
    By encode in forum Data Compression
    Replies: 34
    Last Post: 5th June 2008, 00:16
  4. need advice on low- order backend of tarsalzp
    By Piotr Tarsa in forum Forum Archive
    Replies: 4
    Last Post: 25th March 2008, 13:03
  5. fpaq0f new order 0 model
    By Matt Mahoney in forum Forum Archive
    Replies: 31
    Last Post: 5th February 2008, 23:08

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •