Page 1 of 3 123 LastLast
Results 1 to 30 of 63

Thread: Counters

  1. #1
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,267
    Thanks
    200
    Thanked 985 Times in 511 Posts
    > compared to standard speed = 40:
    > - after hit: prob = prob + (1 - prob) / 40 = 0.5125
    > - after miss: prob = prob - prob / 40 = 0.4875
    > speed controls how much of the error should be used for update, it isn't a
    > direct update value itself.

    Well, this calls for some comments...
    I think that this prediction/error interpretation is bad for theory,
    because its inherently associated with exact prediction of next value in series,
    LMC metric, and such - while the only metric a compressor needs is a Shannon metric.

    So, there're two interpretations that I generally use, depending on a case:

    1. Combinatoric

    For binary sequence with only added information that there're n0 zeroes and n1 ones,
    there're (n0+n1)!/n0!n1! equiprobable cases, and its easily provable that a counter
    like this would be optimal:

    p1 = n1/(n0+n1); // static

    if( y ) c1++; else c0++; p = c1/(c1+c0); // dynamic

    Then, for nonstationary data with local density variations,
    a weighted Shannon metric can be used, and so an
    interval range sequence for optimal encoding would be

    if( y ) c1=c1*w+1,c0=c0*w; else c0=c0*w+1,c1=c1*w; p = c1/(c1+c0);

    And as a speed optimization, it can be rewritten to avoid
    division:

    p = c1/(c1+c0);
    if( y ) p'=(c1*w+1)/(c0*w+c1*w+1); else p'=(c1*w+1)/(c0*w+c1*w+1);
    ---
    if( y ) p'=(c1+1/w)/(c0+c1+1/w); else p'=c1/(c0+c1+1/w);
    ---
    if( y ) p'=p*(c0+c1)/(c0+c1+1/w)+(1/w)/(c0+c1+1/w);
    else p'=p*(c0+c1)/(c0+c1+1/w);
    ---
    T = c0+c1
    T' = c0*w+c1*w+1 = (T+1/w)/(1/w) = T*w+1
    wr = 1-(c0+c1)/(c0+c1+1/w) = (1/w)/(c0+c1+1/w) = 1/T'
    if( y ) p'=p*(1-wr)+wr; else p'=p*(1-wr);
    p = p'; T=T' [


    Then, to compensate for supposed completely random bits
    (not correlated with recent history), the prediction of
    counter can be further improved by static mixing with
    a fixed "order-(-1)" distribution:

    P = p*(1-mw)+0.5*mw

    which means that supposedly counter's prediction "p"
    would be applicable for current symbol/bit with
    probability (1-mw) and its a completely random bit
    otherwise. (Of course, any distribution can be used
    instead of {0.5,0.5}, but lets skip it for simplicity).

    As experience shows that this suggestion is widely
    useful, the mix can be merged with actual counter:

    P = p*(1-mw)+0.5*mw
    T = T*w+1; wr = 1/T;
    if( y ) (p*=(1-wr))+=wr; else p*=(1-wr);
    ---
    P' = ((p*(1-wr)+y*wr)*(1-mw)+0.5*mw) =
    = p*(1-wr)*(1-mw)+0.5*mw+y*wr*(1-mw) =
    = p*((1-mw)-wr)+0.5*mw+y*wr*(1-mw) =
    = ((P-0.5*mw)/(1-mw))*(1-wr)*(1-mw)+0.5*mw+y*wr*(1-mw) =
    = (P-0.5*mw)*(1-wr)+0.5*mw+y*wr*(1-mw) =
    = P*(1-wr) - 0.5*mw + 0.5*mw*wr + 0.5*mw + y*wr*(1-mw) =
    = P*(1-wr) + 0.5*mw*wr + y*wr*(1-mw) =
    = P*(1-wr) + (y*(1-mw)+0.5*mw)*wr
    // y*(1-mw)+0.5*mw == y?1-0.5*mw:0.5*mw
    ---
    T = T*w+1; wr = 1/T; omp = 0.5*mw*wr;
    P = P*(1-wr)+(y?wr-omp : omp);

    And that's exactly what I mostly use.

    The point here is that such a counter is a generalization
    of combinatorical counter and so is able to precisely simulate
    the implementation with dumb counts of occurences.
    And combinatorical counter is good because it allows to
    efficiently encode even completely non-redundant bit sequence.
    Its only redundancy is information contained in parameter values
    (which include initial values for P and T).

    2. Bit sequence hash

    When dynamic probability mapping is used (aka SSE etc),
    a hash interpretation has a lot of sense too.
    Well, its obvious that in fact a "probability" updated
    like that is a convolution of bit sequence and so it
    highly correlates with current suffix bits, especially
    when a fast "update speed" is used (and mostly it is fast enough).

    Thus, because of indirect use, a counter-context like that
    has a completely different optimization metric comparing to
    combinatoric counter and so may (and should actually) be
    implemented with different properties.

    For example, the combination of a normal counter delayed by
    a few bits (that is, updated with bit[x-k] when bit[x] is modelled)
    and a static linear mapping ( a[bit[x-k..x-1]]*p+b[bit[x-k..x-1]]*(1-p) )
    does show good results.

    371699 // test with a single SSE array over normal counter array (fully tuned)
    370757 // counters delayed by 3 bits and a static mapping before SSE added

    And I consider this improvement fairily major as there's no model structure
    change, no contexts or dynamic elements added, and parameters for the first
    result were fully optimized by bruteforce search (well, for second as well).

    But still, this option is highly dependent on implementation details of the
    used dynamic mapping (SSE) and is not understood as well as "normal" counters.
    That is, in most general case of this approach counter can be replaced by
    static lookup table mapping the bit sequence in context to some precalculated
    hash values. But, forgetting about the necessary size of this lookup table
    (~2^40 should be a common value), there's no known way for now to optimize
    the table contents except by directly checking the compression result with
    different settings.

    > speed should be lower for probs like 0.01 or 0.99 because with high speed
    > occasional negative bit in a equal bits run would ruin the statistics.
    > ie. if we have sequence:
    > 0000000000000000000000100000000000000000000000000
    > then on sequence:
    > 0000000000000000000000
    > we should consistently lower the speed because if we encounter bit 1
    > it shouldn't change statistics very much because the following bits are
    > 0000000000000000000000000

    A simple combinatoric counter handles this situation pretty well actually.
    With completely precise implementation it is able to map strings like
    ('0' x N1 . '1' . '0' x N2) directly to values from 0 to N-1=N1+N2
    (with pre-known N).

    Also, of course, sometimes counter tweaks give a gain in compression anyway.
    But then, if a gain origin can be analyzed, it is usually possible to
    add an explicit support for some special cases and leave the counter alone.
    The quoted example is exactly the case like that.

  2. #2
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Continuing about counters. Look at some draft version of a table building:
    Code:
     
        for (int i = 0; i < 4096; i++) { 
          int j = 4095 - i; 
          tab[0][i] = i - (i / (18 + (j >> 10))); 
          tab[1][i] = i + (j / (18 + (i >> 10))); 
        }


    usage: pr = tab[y][pr];

    LZPM with such table compresses my calgary.tar to 859,191 bytes. Which is close to the double counter variant. Note that on some files the compression is better compared to double counter version. But the main thing is speed - looks like it's faster even than straight fpaq0p variant...


  3. #3
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,474
    Thanks
    26
    Thanked 121 Times in 95 Posts
    Shelwien
    please consider that i'm talking about lzp stuff. in my compressor i plan to model literals like ppmd does (for efficient memory utilization - i want to limit number of symbols to ~ 45 in an order-2 context and ~ 15 in an order- 3 context and use fixed size of each context for speed).

    binary alphabets would be used only for encoding escapes (lzp and ppm escapes).

    lzp flag sequence 000000000100000000 would correspond to:
    aaaaaaaaaaaabbbbbbbbbbbb. flaq sequence 00000000011000000 can correspond to aaaaaaaabccccccccc or aaaaaaaaaabaaaaaaaa, i don't store that information anywhere so estimating gets more complicated.

    also there are regions in file with better and worse predictability (in sense of lzp matching). i must exploit that effect by sse.

  4. #4
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,267
    Thanks
    200
    Thanked 985 Times in 511 Posts
    @encode:
    Wonder if you could try these two:

    Code:
     
     for( int i = 1; i < 4096; i++ ) { 
       int SCALElog = 16; 
       int SCALE = 1<<SCALElog ; 
       int sp1 = (18 + ((4096-i) >> 10)); 
       int wr1 = SCALE / sp1; 
       int pr1 = i*(SCALE-wr1) + (i%sp1)*wr1; 
       int sp2 = (18 + (i >> 10)); 
       int wr2 = SCALE / sp2; 
       int pr2 = i*(SCALE-wr2) + (4096-(4096-i)%sp2); 
       tab[0][i] = __max(1,pr1>>SCALElog); 
       tab[1][i] = __min(4095,pr2>>SCALElog); 
     } 
     
     for( int i = 1; i < 4096; i++ ) { 
       int SCALElog = 16; 
       int SCALE = 1<<SCALElog ; 
       int sp1 = (18 + ((4096-i) >> 10)); 
       int wr1 = SCALE / sp1; 
       int mw1 = 10; 
       int pr1 = i*(SCALE-wr1) + mw1*wr1; 
       int sp2 = (18 + (i >> 10)); 
       int wr2 = SCALE / sp2; 
       int mw2 = 10; 
       int pr2 = i*(SCALE-wr2) + (4096-mw2); 
       tab[0][i] = __max(1,pr1>>SCALElog); 
       tab[1][i] = __min(4095,pr2>>SCALElog); 
     }

    Also consider swapping [y] and [p] indexes.

  5. #5
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Quote Originally Posted by Shelwien
    Wonder if you could try these two:
    Something wrong with these two - compressed file becomes even larger than original in both cases.

  6. #6
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,267
    Thanks
    200
    Thanked 985 Times in 511 Posts
    @encode:

    Can you upload some fpaq with your lookup update?
    I seem unable to avoid mistakes without actually testing the code

  7. #7
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Quote Originally Posted by Shelwien
    Can you upload some fpaq with your lookup update?
    OK, wait one minute, I assemble something...

    And now check out simple and more advanced scheme:
    <div class=""jscript""><pre>
    for (int i = 0; i < 4096; i++) {
    tab[0][i] = i - N(i);
    tab[1][i] = i + N(4095 - i);
    }
    // ...
    int N(int i) {
    int j = (i < 2048) + (i < 1024) + (i < 512) + (i < 256)
    + (i < 128) + 19;

    return (i / j);
    }
    [/code]

    With this one, LZPM 0.15 compresses calgary.tar to 859,066 bytes.

  8. #8
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    OK, check out:
    fpaq0p-sb.zip


  9. #9
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,267
    Thanks
    200
    Thanked 985 Times in 511 Posts
    Thanks, it helped.

    http://shelwien.googlepages.com/fpaq0p_sb.htm
    http://shelwien.googlepages.com/fpaq0p-sb_sh.rar

    Seems that in that post I really did miss some *wr2.
    But now it works and we have a real proof that
    randomization in counter update is bad

    Also notice that speed difference (I compiled all
    versions myself with the same options). Its probably
    due to [y] and [p] swapped and more cache hits in
    my version.

    > j = (i<2048)+(i<1024)+(i<512)+(i<256)+(i<128)+19;

    I'd suggest to bruteforce these numbers as well,
    or you won't know how it should really look.
    Well, the best way would be to optimize the whole
    table, but it'll really take some time.

  10. #10
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Quote Originally Posted by Shelwien
    Its probably
    due to [y] and [p] swapped and more cache hits in
    my version.
    Yep, seems to be tab[pr][y] should be better.

    Quote Originally Posted by Shelwien
    Id suggest to bruteforce these numbers as well,
    or you wont know how it should really look.
    Well, I did and will do more testing, to find out whats the best. Like you see my approach is very simple, and looks like it works.

    Quote Originally Posted by Shelwien
    Well, the best way would be to optimize the whole
    table, but itll really take some time.
    What do you mean? Do you have any additional ideas?


  11. #11
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Maybe I should calculate tab using 16-bit and more precision (for more accurate division), rounding down to 12...

  12. #12
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,267
    Thanks
    200
    Thanked 985 Times in 511 Posts
    >> I'd suggest to bruteforce these numbers as well,
    >> or you won't know how it should really look.
    >
    > Well, I did and will do more testing, to find out what's the best.

    Its actually better to prepare a parameter tuning platform
    right from the start...

    > Like you see my approach is very simple, and looks like it works.

    If you meant your 12-bit counters with lookup update, there's
    no wonder actually. As I've already said, 12 bits is ok.
    However, I'm not sure if it'll still be ok if you'd try
    to use eg. 10 different update tables along with other lookups.

    Also there's a significant gain from changing the update speed
    according to amount of accumulated statistics - surprisingly,
    even under SSE. Simple is not always good as its not adaptive.

    > Well, the best way would be to optimize the whole
    > table, but it'll really take some time.
    >
    > What do you mean?

    http://shelwien.googlepages.com/fpaq0p-sb_sh2.cpp

    This is a bruteforce-optimized version of coder from fpaq0p-sb_sh.rar

    wcc386
    399392 // fpaq0p-sb_sh
    399083 // current

    Btw, I actually don't mind giving you my optimizer,
    but its a perl script which patches the executable
    and runs it multiple times.

    > Do you have any additional ideas?

    For what?
    I'm not really interested in making things fast at the cost of
    precision, so won't be much help here.

    Btw, have you considered the fact that LZ scheme can only be used
    for "universal" file compression (and without proper format support,
    even precomp-style), while good CM techniques can be applied wherever
    good model of some process is needed - not limited to compression at all.

  13. #13
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,267
    Thanks
    200
    Thanked 985 Times in 511 Posts
    There's a simple formula for error cost in interval coding.
    Supposing that you have k bit precision for range value, then
    e = -100*log2( 1-0.5/2^k )
    would be redundancy in percents.

    Code:
     
    In[3]:= Table[ {k, -100*Log[2, 1 - 0.5/2^k]}, {k, 1, 20}] 
     
    Out[3]= {{1, 41.5037}, {2, 19.2645}, {3, 9.31094}, {4, 4.58037}, {5, 
      2.27201}, {6, 1.13153}, {7, 0.564656}, {8, 0.282052}, {9, 
      0.140957}, {10, 0.0704613}, {11, 0.0352263}, {12, 0.0176121}, {13, 
      0.00880578}, {14, 0.00440282}, {15, 0.00220139}, {16, 
      0.00110069}, {17, 0.000550346}, {18, 0.000275173}, {19, 
      0.000137586}, {20, 0.0000687931}}

  14. #14
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Tested your versions with LZPM. Well, looks like your approach gives no improvement. We get higher compression with calgary.tar and loose it on many other files, and so on. The catch in tuning.
    Code:
     
      int N(int i) { 
        int j = 19; 
        for (int k = 128; k < 4096; k <<= 1) 
          j += (i < k); 
     
        return (i / j); 
      }
    This one works pretty well. I tried many schemes including Fibonacci, i=i+i, i=i*i, ... This one turns out the best. In addition, I analyzed what's going on with double counter - looks like we cannot do exact approximation with tables. In terms of N(i) function, double counters use a large range of a divisor.

    The brute-force stuff can help for a given file - not for all files.


  15. #15
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,267
    Thanks
    200
    Thanked 985 Times in 511 Posts
    > Tested your versions with LZPM. Well, looks like your approach gives no
    > improvement.

    So what you call an "approach" here? My specific parameter values
    was optimized on wcc386 only so obviously are not "common" or anything.
    Normally such a optimization should be done on a whole test dataset,
    but I don't have that kind of resources to throw away.

    > We get higher compression with calgary.tar and loose it on many
    > other files, and so on. The catch in tuning.

    As I said the point is adaptation.
    Meaning faster initial learning and at least some dynamic mixing,
    though multidimensional dynamic mapping is always more efficient.

    > for (int k = 128; k < 4096; k <<= 1)

    The optimized values in my example was
    {529, 604, 844, 1868, 1900}
    So I'd suggest to concentrate on interval [512..2048]
    ... though that could be different in fpaq and LZPM.

    > This one works pretty well. I tried many schemes including Fibonacci, i=i+i,
    > i=i*i, ...

    Shamanic ways again

    > This one turns out the best. In addition, I analyzed what's going on
    > with double counter - looks like we cannot do exact approximation with tables.

    Here're some ideas about further experiments with double counter.
    1. Use a mix of two table-based counters and tune their parameters
    to get a good result
    2. Try to mix the tables and use a single "mixed" counter
    (won't do probably)
    3. Still use the mixed table, but add a difference table which would
    allow to restore the original counter values
    4. Difference table probably can be indexed by a difference counter,
    and its precision can be limited, so that mixed counter + diff.state
    could be packed into 16 bits. Further optimizations ensue.
    5. Updated counter state and probability estimation could be totally
    separated (two different tables for estimation and update).

    > The brute-force stuff can help for a given file - not for all files.

    That's wrong. Even if you use some random values like fibonacci numbers,
    its still the same parameter tuning.

    Of course, a proper sample should be used for parameter optimization,
    what I obviously didn't as my purpose wasn't to supply you with a
    perfectly optimized counter, but to show you an example of how you
    supposed to do it.

  16. #16
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    OK, thank you for hints!

    I need just do more experiments.

    Overall, I like the table driven stuff - as found it's even faster than pr updates with shifts. Note than goal of the LZPM is a FAST decompression - even at the cost of the compression ratio. During a few weeks or months I will do many experiments (hand crafted brute-force), as always, and decide what do use and what do not.

  17. #17
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Trying to get into the true brute-force I add to the LZPM an additional parameters and generate a batch file - a few GB long.

  18. #18
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    By the way, your numbers fairly works!
    Code:
     
      int N(int i) { 
        int j = 18 
            + (i < 1900) + (i < 1868) + (i < 844) + (i < 604) + (i < 529); 
     
        return (i / j); 
      }

    With such N(), LZPM compresses calgary.tar to 858,964 bytes.


  19. #19
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    About shamanic formulas:
    Code:
     
      int N(int i) { 
        const int a1 = 512; 
        const int a2 = a1 + 128; 
        const int a3 = a2 + 256; 
        const int a4 = a3 + 1024; 
        const int a5 = a4 + 128; 
     
        int j = 17 
            + (i < a1) + (i < a2) + (i < a3) + (i < a4) + (i < a5); 
     
        return (i / j); 
      }
    calgary.tar compressed to: 858,883 bytes

  20. #20
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Or even with 6-point envelope:
    Code:
     
      int N(int i) { 
        const int a1 = 256; 
        const int a2 = a1 + 256; 
        const int a3 = a2 + 128; 
        const int a4 = a3 + 256; 
        const int a5 = a4 + 1024; 
        const int a6 = a5 + 128; 
     
        int j = 18 
            + (i < a1) + (i < a2) + (i < a3) 
            + (i < a4) + (i < a5) + (i < a6); 
     
        return (i / j); 
      }
    calgary.tar -> 858,880 bytes

  21. #21
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,267
    Thanks
    200
    Thanked 985 Times in 511 Posts
    > Trying to get into the true brute-force I add to the LZPM an
    > additional parameters and generate a batch file - a few GB long.

    Well, here's my optimizer, with perl included:
    http://shelwien.googlepages.com/fpaq0p-sb_sh_full.rar
    It should work right after unpacking by starting 1.bat

    But actually its better to completely separate the
    specific module you're working on into a standalone
    program. Of course, you'd need to prepare a dataset
    for it - usually that's done by logging the necessary
    values during the normal run.

    Meaning, as I said before, that with your LZ its certainly
    useful to log the matches you've got from processing your
    "few GB" and work with the log instead of wasting time
    regenerating these matches.

    > By the way, your numbers fairly works!
    > int j = 18 + (i<1900) + (i<1868) + (i<844) + (i<604) + (i<529);
    > With such N(), LZPM compresses calgary.tar to (858,964 bytes.

    Wonder what you tried before then...

    > About shamanic formulas:
    > const int a1 = 512;
    > const int a2 = a1 + 128;
    > const int a3 = a2 + 256;
    > const int a4 = a3 + 1024;
    > const int a5 = a4 + 128;
    > calgary.tar compressed to: 858,883 bytes

    Well, this is not shamanic yet, just a rough
    approximation. Still I'll bet that
    "commonly optimal" values, optimized on a large dataset,
    wouldn't be powers-of-two or anything else you'd
    consider simple.

  22. #22
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Hi!

    Did anybody ever notice this - it's nothing special...

    an "adjustment proportional to a fraction of the error" is in fact a non stationary of the form:

    p1 = n1 / n

    Update, after bit y:

    p1' = (c n1 + y) / (c n + 1) = n1' / n'

    And now take the difference:

    p1' - p1

    = (c n1 + y) / (c n + 1) - n1 / n
    = (c n1 n + y n - c n1 n - n1) / (n (c n + 1))
    = (y - n1 / n) / (c n + 1)

    Name the denominator's term be n' = c n + 1 (as stated above), in other words:

    n(k+1) = c n(k) + 1

    If n is initially zero, we get:

    n(0) = 0
    n(1) = 1
    n(2) = c+1
    n(3) = c^2 + c + 1
    ...
    n(k) = c^(k-1) + .... + c^1 + 1 + 0 = sum from i=1 to k c^i

    Since c<0 (give more weight to newer events) and letting k go to infinity, we get:

    n(k->infinity) = 1/(1-c)

    So the denominator is constant after enough iterations. In the "infinite" case we get:

    p1' = p1 + (y - p1) * (1-c)

    Which could be

    p1 += y-p1 >> 5
    M1, CMM and other resources - http://sites.google.com/site/toffer86/ or toffer.tk

  23. #23
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,267
    Thanks
    200
    Thanked 985 Times in 511 Posts
    Well, I did notice

    But

    1. Counter gives better predictions when you keep this n(k)
    along with the probability (see first post in this thread).
    Of course its slower though.

    2. Updates like "p1+=(y-p1)>>5" introduce noise into the counter.
    See that post

  24. #24
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Shelwien
    1. Counter gives better predictions when you keep this n(k)
    along with the probability (see first post in this thread).
    Of course its slower though.
    If you made enough adjustments, mathematically inifnity, practically (due to rounding, etc) after a finite number of steps, it doesnt matter. When this counter hasnt gone into steady state this is something like Matts decreasing adaption speed, i cant remember the formula he used though.


    BTW:

    When looking at my CMs probability output i thought: What would you do, if you were a mixer - Assign high weightes to the most probable predictions (and some logical stuff to avoid a conflict, e.g. order1 gives p1=0, order6 gives p1=1, so trust higher orders)

    Has anybody ever tried entropy based mixing? This could be implemented just by a table lookup.

    In some experiments i noticed that the "normal" context mixing does a fairly well job at lower orders (say up to 5 or 6). Higher orders offer too few statistics to be modeled the same way, but are usually highly predicitve. Including a match model (type of paq) doesnt actually model this redundancy, it only includes a rough measure of predictability (match length), im sure that the NN does a better job here than my mixer.
    I think higher orders can be modeled using a "binary context" with just one symbol, if the context match is long enough.
    After some web search i found ppmdet, which does something similar. Charles Bloom also included a nice describtion at his page (which looks promising). What about using something like this for higher order contexts? This could give a speedup, too.
    And i noticed something else in my empirical experiments - you can just gain compression by modelling predictive symbols even better. i found no way to model "noisy" symbols in a better way...

    EDIT: i agree that something like this is random, due to the modulo:

    pr = pr*(SCALE-wr1) + (SCALE-(SCALE-pr)%Speed)*wr1; // y=1

    But i cant see how you derive this modulo from an update like above.
    M1, CMM and other resources - http://sites.google.com/site/toffer86/ or toffer.tk

  25. #25
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,267
    Thanks
    200
    Thanked 985 Times in 511 Posts
    > > 1. Counter gives better predictions when you keep this n(k)
    > > along with the probability (see first post in this thread).
    > > Of course its slower though.
    >
    > If you made enough adjustments, mathematically infinity, practically (due
    > to rounding, etc) after a finite number of steps, it doesn't matter.

    I have lookup tables for mapping k to n(k) in my coders,
    their declaration looks like this:
    [...]
    DefTbl( 250, 1400 );
    DefTbl( 200, 1050 );
    DefTbl( 175, 950 );
    DefTbl( 150, 800 );
    DefTbl( 120, 600 );
    DefTbl( 100, 460 );
    DefTbl( 80, 450 );
    DefTbl( 64, 300 );
    DefTbl( 45, 200 );
    DefTbl( 40, 200 );
    DefTbl( 34, 200 );
    DefTbl( 28, 128 );
    [...]
    first parameter here is an inverse speed (eg. 40 means c=1-1/40 in your terms),
    and second is table size... that is, your "infinity".
    So its very small actually, that's right.
    But still I don't like compression loss in the version with fixed
    update speed, so have to deal with additional statistics.
    Guess, the "infinity" is not always reached in all counters if file
    is not really huge... especially in higher orders.

    > When looking at my CM's probability output i thought: What would you do, if
    > you were a mixer - Assign high weightes to the most probable predictions
    > (and some logical stuff to avoid a conflict, e.g. order1 gives p1=0, order6
    > gives p1=1, so trust higher orders)

    Well, I'd advice not to try using common sense for model design -
    it just doesn't work.

    So if you really want something as a target, then gather statistics
    from paq8 - add your own counters there and see what paq8 predicts
    for given predictions of your submodels.

    > Has anybody ever tried entropy based mixing?
    > This could be implemented just by a table lookup.

    How's that? Accumulate length of code in a context and derive weight from it?
    I tried it and it wasn't particularly good.
    Also there was an algorithm/compressor named CTW, which attempted something
    along that way, if I'm not mistaken.

    > In some experiments i noticed that the "normal" context mixing does a
    > fairly well job at lower orders (say up to 5 or 6).

    How do you know?
    That is, how do you determine which prediction quality is good for given submodel?

    > I think higher orders can be modeled using a "binary context" with just one
    > symbol, if the context match is long enough.

    Well, they can. With significant compression loss.
    Again, imho the right way to design a compressor is to build something with
    which you'd be sure that you cannot afford better quality, and then apply
    speed optimizations, which might include skipping or regrouping counter
    updates and such. Actually, you can even finish with something like LZ77
    along that way. But I don't believe that its possible to get a good model
    by improving a normal LZ77, as its much harder to remove speed optimizations
    than to apply them.

    > What about using something like this for higher order contexts?

    That's quite common actually.

    > This could give a speedup, too.

    Wonder what exactly do you intend to do with your compressor...
    For comparisons its better to concentrate on beating paq8, and for any
    practical tasks there's no sense in writing a dictionary model as even
    simple custom models for specific formats perform significantly better.

    > EDIT: i agree that something like this is random, due to the modulo:
    >
    > pr = pr*(SCALE-wr1) + (SCALE-(SCALE-pr)%Speed)*wr1; // y=1
    >
    > But i can't see how you derive this modulo from an update like above.

    p += x/w
    p = (p*w + x - x%w)/w
    etc

    And quoted formula is equivalent to
    p += (1-p)/Speed
    That was even proved with real coder in that thread.

  26. #26
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Concerning the counter updates (i'm busy at the moment, can't write much) - is this correct:

    p(k+1) = p(k) + floor[ (y(k) - p(k)) / S ]

    You mathematically "emulate" floor by substracting the division's rest:

    p(k+1) = p(k) + [ (y(k)-p(k)) - (y(k)-p(k))%S ] / S

    This gives:

    p(k+1) = p(k) * (1-1/S) + y(k) * (1/S) - [ ((y(k)-p(k))%S ] / S

    Where epsilon(k) = - [ ((y(k)-p(k))%S ] / S is the "noise"-term. You eliminate this by just using:

    p(k+1) = p(k) * (1-1/S) + y(k) * (1/S)

    Is this correct? The second thing is that you include the partital sum of the above mentioned n(k) in your counters? Anything else?
    M1, CMM and other resources - http://sites.google.com/site/toffer86/ or toffer.tk

  27. #27
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,267
    Thanks
    200
    Thanked 985 Times in 511 Posts
    > Is this correct?

    Yes, mostly. Actually I use a constant parameter instead of that noise.
    As explained in first post of this thread.

    > above mentioned n(k) in your counters?

    Just an index for the lookup table with n(k)=c*n(k-1)+1 values

    > Anything else?

    Delayed update?

  28. #28
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Actually every integer division (or right shift) can be written like that, so you can't overcome this noise. I wonder how you want to avoid that?
    M1, CMM and other resources - http://sites.google.com/site/toffer86/ or toffer.tk

  29. #29
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,267
    Thanks
    200
    Thanked 985 Times in 511 Posts
    > Actually every integer division (or right shift) can be written like that, so
    > you can't overcome this noise. I wonder how you want to avoid that?

    By not using any small divisors?

    Anyway, its a fact that p-=(p>>5) kind of updates can be rewritten
    to use multiplication by inverse and division remainder without any
    changes in probability sequence.
    And then, its a fact that replacing that remainder by constant improves
    the compression.

  30. #30
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Taking this:

    p(k+1) = p(k) * (1-1/S) + y(k) * (1/S) = p(k) * a + y(k) * b

    a and b practically are scaled by a factor N = 2^M. So you need to rescale after an adjustment:

    a = A/N, b = B/N
    (where A and B are in [0,2^M -1])

    p(k+1) = floor( (p(k) * A + y(k) * B)/N )

    This can be rewritten like in the previous posts. The rescaling by N introduces this noise. I still don't understand how you want avoid this, you still need to do a division/shift (when working with ints).

    I don't get it...
    M1, CMM and other resources - http://sites.google.com/site/toffer86/ or toffer.tk

Page 1 of 3 123 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •