Page 2 of 3 FirstFirst 123 LastLast
Results 31 to 60 of 63

Thread: Counters

  1. #31
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,915
    Thanks
    291
    Thanked 1,273 Times in 720 Posts
    Of course there'd be some noise either way, when calculating with limited precision.
    But the noise with small divisors is "more random" or something like that.

    Or there's other way too.
    Namely, you can use interval arithmetics for all
    calculations and select some specific value from
    the output interval instead of random which you
    get without interval arithmetics

  2. #32
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Ok, it is obvious that masking out more bits reduces randomness, since higher bits contribute more significantly to the variable's value. This could be an explanation.
    How do you select the term for "noise replacment" (you know, what i mean)? Do you give more weight to more probable symbols? Or which kind of distribution do you use to mix?
    M1, CMM and other resources - http://sites.google.com/site/toffer86/ or toffer.tk

  3. #33
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,915
    Thanks
    291
    Thanked 1,273 Times in 720 Posts
    > How do you select the term for "noise replacment" (you know, what i mean)?

    By trying all and finding the best value
    Though rough estimations can be derived from the "look" of data -
    basically, you set a higher weight of static part
    for less predictable data.

    > Do you give more weight to more probable symbols?

    Usually not, though sometimes it might have sense.

    > Or which kind of distribution do you use to mix?

    Not sure if I got it right, but for the
    counter-mixed-with-static-distribution model from the first post
    I normally use a flat distribution, so it works like a measure
    of amount of random patches in the data or something like that.

    Btw, random values (remainders) have kinda the same effect too
    (due to non-zero average),
    so the compression probably would get worse if you just remove
    the "noise component".

  4. #34
    Member
    Join Date
    Feb 2008
    Posts
    31
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Shelwien
    > compared to standard speed = 40:
    > - after hit: prob = prob + (1 - prob) / 40 = 0.5125
    > - after miss: prob = prob - prob / 40 = 0.4875
    Simpler code for power of 2 speed:
    Code:
    Prob -= Prob >> n; 
    if ( hit ) Prob += 1 >> n;

    See PPMd .

    For binary sequence with only added information that there're n0 zeroes and n1 ones,
    there're (n0+n1)!/n0!/n1! equiprobable cases, and its easily provable that a counter
    like this would be optimal:
    p1 = n1/(n0+n1); // static
    It is approximate solution for large n0 & n1. For precise solution, see MP-codes.

  5. #35
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,915
    Thanks
    291
    Thanked 1,273 Times in 720 Posts
    > Simpler code for power of 2 speed:
    > Prob -= Prob >> n;
    > if ( hit ) Prob += 1 >> n;
    > See PPMd .

    Really? I thought you only use frequencies there.

    >> For binary sequence with only added information that there're n0 zeroes and n1
    >> ones,
    >> there're (n0+n1)!/n0!n1! equiprobable cases, and its easily provable that a counter
    >> like this would be optimal:
    >> p1 = n1/(n0+n1); // static
    >
    > It is approximate solution for large n0 & n1.

    Well, its globally optimal for static coding.
    And also the dynamic formula is optimal for adaptive coding, though
    its simpler to prove as the product of probabities with properly
    decremented counters equals to that binomial coefficient.

    > For precise solution, see MP-codes.

    Is that supposed to be some binomial-based coding?
    Like this: http://compression.ru/sh/mark1.rar ?

  6. #36
    Member
    Join Date
    Feb 2008
    Posts
    31
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Shelwien
    Really? I thought you only use frequencies there.
    See SSE for binary contexts.

    Quote Originally Posted by Shelwien
    Well, its globally optimal for static coding.
    Oh, sorry wrong quote. Right quote:
    Quote Originally Posted by Shelwien
    if( y ) c1++; else c0++; p = c1/(c1+c0); // dynamic
    Quote Originally Posted by Shelwien
    Is that supposed to be some binomial-based coding?
    Like this: http://compression.ru/sh/mark1.rar ?
    Yes, something similar, but IMHO your formula is incorrect, correct formula does not include factorials and C(n,k).

  7. #37
    Member
    Join Date
    Feb 2008
    Posts
    31
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Shelwien
    Is that supposed to be some binomial-based coding?
    Like this: http://compression.ru/sh/mark1.rar ?
    Ok. It is simpler to write correct formula:
    n - length of bitstring s;
    k - number of 1s in string s;
    + - concatenation of strings;
    p(1|s)=
    P(s+1)/(P(s+1)+P(s+0))=
    (k+1)^(k+1)/k^k/((k+1)^(k+1)/k^k+(n-k+1)^(n-k+1)/(n-k)^(n-k))

  8. #38
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,915
    Thanks
    291
    Thanked 1,273 Times in 720 Posts
    >> Really? I thought you only use frequencies there.
    >
    > See SSE for binary contexts.

    I see only SEE there
    But if that's what you meant, I'd just believe you, as the code there
    isn't quite readable and I'm not that interested.

    And what I am interested in, is "stateless" counters, ie, ideally,
    a way to dynamically compute a good enough estimation directly from
    the context history. Alas, its not as simple as it seems, because
    with short histories (eg 16 bits is short) the prediction of normal
    counter is usually much better, and non-lookup implementations are damn slow.

    Btw, the reason is that such an approach really improves the speed
    of bruteforce parameter optimization.

    >> Well, its globally optimal for static coding.
    >
    > Oh, sorry wrong quote. Right quote:
    >
    >> if( y ) c1++; else c0++; p = c1/(c1+c0); // dynamic

    Well, for small c0,c1 there's no single "precise solution" anyway,
    as it all depends on which sequences you consider more frequent.

    >> Is that supposed to be some binomial-based coding?
    >> Like this: http://compression.ru/sh/mark1.rar ?
    >
    > Yes, something similar, but IMHO your formula is incorrect,
    > correct formula does not include factorials and C(n,k).

    Well, there's a working coder without much redundancy, so I don't
    care if you think its incorrent.
    Also there're obviously different ways to do the same... with
    additional proof in the form of recent ABS coder... who'd think
    its still possible to create a really different interval coding
    alternative.

  9. #39
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,915
    Thanks
    291
    Thanked 1,273 Times in 720 Posts
    > Ok. It is simpler to write correct formula:
    > n - length of bitstring s;
    > k - number of 1s in string s;
    > '+' - concatenation of strings;
    > p(1|s )= P(s+1)/(P(s+1)+P(s+0)) =
    > (k+1)^(k+1)/k^k/((k+1)^(k+1)/k^k+(n-k+1)^(n-k+1)/(n-k)^(n-k))

    Ah. Quite as I suspected.
    So its not "doesn't include factorials and binomial coefficients",
    it just uses a (in)famous factorial approximation...
    http://en.wikipedia.org/wiki/Stirling%27s_approximation
    Btw, its also only good for large numbers, so I don't think there's
    actually any difference... but thanks anyway, as I haven't heard about
    this particular implementation. Guess I'd appreciate any links or articles
    if you have any.

  10. #40
    Member
    Join Date
    Feb 2008
    Posts
    31
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Shelwien
    I see only SEE there
    For binary contexts p(SSE)=1-p(SEE)
    Quote Originally Posted by Shelwien
    And what I am interested in, is "stateless" counters, ie, ideally,
    a way to dynamically compute a good enough estimation directly from
    the context history. Alas, its not as simple as it seems, because
    with short histories (eg 16 bits is short) the prediction of normal
    counter is usually much better, and non-lookup implementations are damn slow.
    Formula must be tabulated, of course.
    Quote Originally Posted by Shelwien
    Well, theres a working coder without much redundancy, so I dont
    care if you think its incorrent.
    Redundancy can be reduced with proper formula. I heard, You are concerned for precise models, are not You?

  11. #41
    Member
    Join Date
    Feb 2008
    Posts
    31
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Shelwien
    Btw, its also only good for large numbers, so I dont think theres
    actually any difference... but thanks anyway, as I havent heard about
    this particular implementation. Guess Id appreciate any links or articles
    if you have any.
    AFAIR, it is by Shtarkov and it is not approximation, but precise formula.

  12. #42
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,915
    Thanks
    291
    Thanked 1,273 Times in 720 Posts
    > > I see only SEE there
    >
    > For binary contexts p(SSE)=1-p(SEE)

    I know, but my texteditors search function doesn't .

    > > And what I am interested in, is "stateless" counters, ie, ideally,
    > > a way to dynamically compute a good enough estimation directly from
    > > the context history. Alas, its not as simple as it seems, because
    > > with short histories (eg 16 bits is short) the prediction of normal
    > > counter is usually much better, and non-lookup implementations are damn slow.
    >
    > Formula must be tabulated, of course.

    Well, not that is "must be", actually... its all just the matter
    of speed impact on the optimizer.

    But still I'd like to find some counter decomposition/approximation
    which would be precise enough and much faster than recurrent update.

    > > Well, there's a working coder without much redundancy, so I don't
    > > care if you think its incorrent.
    >
    > Redundancy can be reduced with proper formula. I heard, You are concerned for
    > precise models, are not You?

    Certainly not "You"

    Also that's not a proper formula for nonstationary data, so I wouldn't
    really use it even if it was ideal for stationary data, what it isn't.

    > > Btw, its also only good for large numbers, so I don't think there's
    > > actually any difference... but thanks anyway, as I haven't heard about
    > > this particular implementation. Guess I'd appreciate any links or articles
    > > if you have any.
    >
    > AFAIR, it is by Shtarkov and it is not approximation, but precise formula.

    Why don't you just look at that wikipedia page?
    Also afair Shtarkov really liked the Stirling formula even back then, when
    I'd seen some articles of his. It allows to turn a simple binomial formula
    into a scary half-page monster, so I kinda understand why, though

    Code:
     
    n! ~= Sqrt(2*Pi*n)*n^n/Exp(n) 
     
    C(x+y,x) = (x+y)!/x!/y! 
     
    = Sqrt(2*Pi*(x+y))*(x+y)^(x+y)/Exp(x+y) / ( Sqrt(2*Pi*x)*x^x/Exp(x) * Sqrt(2*Pi*y)*y^y/Exp(y) ) 
     
    = (x+y)^(x+y+,5) / x^(x+.5) / y^(y+.5) / Sqrt(2*Pi)

    Also about that formula for probability:

    Code:
     
    C(x+y,x) - number of bit strings with x zeros and y ones. 
     
    p1 = C(x+1+y,x+1) / (C(x+1+y,x) + C(x+1+y,x+1)) 
     
    = (x+1+y)^(x+1+y+,5) / (x+1)^(x+1.5) / y^(y+.5) / 
    ( (x+1+y)^(x+1+y+,5) / (x+1)^(x+1.5) / y^(y+.5) + 
      (x+1+y)^(x+1+y+,5) / x^(x+.5) / (y+1)^(y+1.5) ) 
     
    = x^(x+.5) * (y+1)^(y+1.5) / ( x^(x+.5) * (y+1)^(y+1.5) + (x+1)^(x+1.5) * y^(y+.5) )

    Doesn't that look similar now? Guess that .5 was removed for ugliness, though

    And finally:

    Code:
     
    p1 = C(x+1+y,x+1) / (C(x+1+y,x) + C(x+1+y,x+1)) 
     
    = (x+1+y)!/(x+1)!/y! / ( (x+1+y)!/x!/(y+1)! + (x+1+y)!/(x+1)!/y! ) 
     
    = x!*(y+1)! / ( (x+1)!*y! + x!*(y+1)! ) = 
     
    = x!*y!*(y+1) / ( x!*y!*(x+1) + x!*y!*(y+1) ) = 
     
    = y+1 / (x+y+2)

  13. #43
    Member
    Join Date
    Feb 2008
    Posts
    31
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Shelwien
    Also afair Shtarkov really liked the Stirling formula even back then, when
    Id seen some articles of his. It allows to turn a simple binomial formula
    into a scary half-page monster, so I kinda understand why, though
    Maximum likehood probability of bitstring is:
    P(s)=const(n)*(k/n)^k*((n-k)/n)^(n-k)
    Substitute it to
    p(1|s )= P(s+1)/(P(s+1)+P(s+0))
    and You will get the Shtarkov formula for MP-codes. As You can see, there is no approximations and no Stirling formula, but You can use it for further simplifications at large n0 & n1 and You will get Dirichlet estimator:
    p(1|s)=(n1+1/2)/(n0+n1+1)
    MP-codes are useful for small n0 & n1, for larger ones, Dirichlet estimator is enough.
    Quote Originally Posted by Shelwien
    I know, but my texteditors search function doesnt .
    <div class=""jscript""><pre>#define GET_MEAN(SUMM,SHIFT) ((SUMM+ROUND) >> SHIFT)
    inline void PPM_CONTEXT::encodeBinSymbol(int symbol)
    {
    STATE& rs=oneState();
    WORD& bs=BinSumm[QTable[rs.Freq-1]][NS2BSIndx[suff()->NumStats]+PrevSuccess+
    Flags+((RunLength >> 26) & 0x20)];
    UINT tmp=rcBinStart(BSumm=bs,TOT_BITS); bs -= GET_MEAN(BSumm,PERIOD_BITS);
    if (rs.Symbol == symbol) {
    bs += INTERVAL; rcBinCorrect0(tmp);
    FoundState=&rs; rs.Freq += (rs.Freq < 196);
    RunLength++; PrevSuccess=1;
    } else {
    rcBinCorrect1(tmp,BIN_SCALE-BSumm); CharMask[rs.Symbol]=EscCount;
    NumMasked=PrevSuccess=0; FoundState=NULL;
    }
    }
    [/code]

    It is also described in my paper on PPMII.

  14. #44
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,915
    Thanks
    291
    Thanked 1,273 Times in 720 Posts
    >> Also afair Shtarkov really liked the Stirling formula even back then, when
    >> I'd seen some articles of his. It allows to turn a simple binomial formula
    >> into a scary half-page monster, so I kinda understand why, though
    >
    > Maximum likehood probability of bitstring is:
    > P(s)=const(n)*(k/n)^k*((n-k)/n)^(n-k)

    Do I really need to repeat that its an approximation?
    Because there're exactly n!/k!/(n-k)! possible bitstrings of length n with k 1s,
    so P(s) = k!*(n-k)!/n!

    > Substitute it to
    > p(1|s )= P(s+1)/(P(s+1)+P(s+0))
    > and You will get the Shtarkov formula for MP-codes. As You can see, there is no
    > approximations and no Stirling formula, but You can use it for further
    > simplifications at large n0 & n1 and You will get Dirichlet estimator:
    > p(1|s)=(n1+1/2)/(n0+n1+1)

    Whatever, the method of derivation doesn't change anything, and I can do this:
    Code:
     
    number of strings with x zeroes and y+1 ones, ending with 1 is: C(x+1+y,x)/(y+1) 
    number of strings with x+1 zeroes and y ones, ending with 0 is: C(x+1+y,x+1)/(x+1) 
     
    p1 = C(x+1+y,x)/(y+1) / ( C(x+1+y,x)/(y+1) + C(x+1+y,x+1)/(x+1) ) 
     
    = C(x+1+y,x)*(x+1) / ( C(x+1+y,x) + C(x+1+y,x+1)*(y+1) ) 
     
    = C(x+2+y,x+1) / (C(x+1+y,x) + C(x+2+y,x+1))


    and then replace C(n,k) with a Stirling formula and get more or less the same
    result (what I more or less already did in previous post).

    > MP-codes are useful for small n0 & n1, for larger ones, Dirichlet estimator is
    > enough.

    Your observations don't change anything either
    (that is, if you have any and don't just quote some article),
    as I'm certain that you can get even better empirical results
    than "MP-codes" by adjusting the d value in the p1=(y+d)/(x+y+2*d) formula.

    Also I don't see how n! is worse than n^n, computation-wise.

    >> I know, but my texteditors search function doesn't .
    >
    > #define GET_MEAN(SUMM,SHIFT) ((SUMM+ROUND) >> SHIFT)

    Yeah, I've seen this and its not getting any more readable by posting it here.

    Also I wasn't arguing with this from the start, its just that its the same
    update function as the one that paq and derivatives use,
    and I consider it noisy and imprecise due to fixed set of available "update speeds".

  15. #45
    Member
    Join Date
    Feb 2008
    Posts
    31
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Shelwien
    Do I really need to repeat that its an approximation?
    It is not approximation, it is binomial distribution.
    Quote Originally Posted by Shelwien
    Because therere exactly n!/k!/(n-k)! possible bitstrings of length n with k 1s,
    so P(s) = k!*(n-k)!/n!
    Are You sure? You select one string with k 1s and length n, but the same source can generate strings with the same length n, but with other k.
    Quote Originally Posted by Shelwien
    Your observations dont change anything either
    (that is, if you have any and dont just quote some article),
    as Im certain that you can get even better empirical results
    than "MP-codes" by adjusting the d value in the p1=(y+d)/(x+y+2*d) formula.
    It is KT-estimator for binary alphabet. Asimptotically, redundancy is minimized for d=1/2.
    Quote Originally Posted by Shelwien
    Also I dont see how n! is worse than n^n, computation-wise.
    I dont talk about simplicity of computations, but about precise estimation.
    Quote Originally Posted by Shelwien
    Also I wasnt arguing with this from the start, its just that its the same
    update function as the one that paq and derivatives use,
    and I consider it noisy and imprecise due to fixed set of available "update speeds".
    I am talking about the same. For example, We can use MP-codes for small n0 & n1 and switch to Dirichlet estimator after gathering of enough statistics. All things connected with nonstationarity must be applied to Dirichlet estimator because at begining of coding We do not have statistics even for stationary source.

  16. #46
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,495
    Thanks
    26
    Thanked 131 Times in 101 Posts
    btw:

    dmitry, can you find some spare time to reformat ppmd sources? currently they are hard to read (you write two instructions per row, what can fool the reader).

    also can you write some new complete article on ppmdii in english? currently on your site i see two articles one in russian (my russian is very poor) and one in english which is little outdated (it refers to ppmdh while ppmdi or ppmdj are noticeably better) and requires reader to understand previous article written in english. also your english probably improved

    i would be very grateful to you if you do that

  17. #47
    Member
    Join Date
    Feb 2008
    Posts
    31
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by donkey7
    dmitry, can you find some spare time to reformat ppmd sources? currently they are hard to read (you write two instructions per row, what can fool the reader).
    Such formatting is convinient for me, You can reformat as You want, of course.
    Quote Originally Posted by donkey7
    also can you write some new complete article on ppmdii in english? currently on your site i see two articles one in russian (my russian is very poor) and one in english which is little outdated (it refers to ppmdh while ppmdi or ppmdj are noticeably better) and requires reader to understand previous article written in english.
    PPMdi & PPMdj are simply tuning of old ideas, there is nothing principially new in modeling since var.G and nothing principially new in tree building since var.H.
    Quote Originally Posted by donkey7
    also your english probably improved
    It is degraded .

  18. #48
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,915
    Thanks
    291
    Thanked 1,273 Times in 720 Posts
    >> Because there're exactly n!/k!/(n-k)! possible bitstrings of length n with k 1s,
    >> so P(s) = k!*(n-k)!/n!
    >
    > Are You sure? You select one string with k 1s and length n, but the same source
    > can generate strings with the same length n, but with other k.

    Whatever k there is, the number of binary strings of length n with k ones is
    C(n,k)=n!/k!/(n-k)!
    So given that we know the current k and n, there're C(n+1,k)/(n+1-k) different
    strings of length n+1 with zero at last position, and C(n+1,k+1)/(k+1) strings
    with one at last position, so we can calculate a relative frequency aka
    probability.

    > It is not approximation, it is binomial distribution.

    The formula can be originally conjured by any magic, but when I can reach
    the same by replacing some functions with approximations, I'd call it approximation.

    Also there's nothing bad in it, and depending on the context, given approximation
    might be precise.

    Actually, if you forgot already, I just tried to explain the relation of your
    formula to binomial coefficients.

    >> Your observations don't change anything either
    >> (that is, if you have any and don't just quote some article),
    >> as I'm certain that you can get even better empirical results
    >> than "MP-codes" by adjusting the d value in the p1=(y+d)/(x+y+2*d) formula.
    >
    > It is KT-estimator for binary alphabet. Asimptotically, redundancy is minimized
    > for d=1/2.

    Dunno how can you get that asymptotically.
    Though instead I know that ( D[f(x),x]=f'(x) )
    Code:
     
     In[1]:= Solve[ D[x*Log[((x+d)/(x+y+2*d))]+y*Log[((y+d)/(x+y+2*d))],d]==0,d] 
    Out[1]:= {{d->0}}
    And that's what I'd call asymtotical estimation.
    Also we can exclude d=0 (as it can't be used with x=y=0)
    by substitution d->1/(1+e^2) and taking a derivative by e.
    Then it says that minimum is at e=0, so d=1.
    Pity that I don't have any good grass, so can't immediately do something
    to make it d=.5

    ...Well, with all that I only wanted to show that there's no single ideal solution.

    >> Also I don't see how n! is worse than n^n, computation-wise.
    >
    > I don't talk about simplicity of computations, but about precise estimation.

    And I'm talking about the reason for using an approximation when its as slow
    as the precise formula, but imprecise.

    >> Also I wasn't arguing with this from the start, its just that its the same
    >> update function as the one that paq and derivatives use,
    >> and I consider it noisy and imprecise due to fixed set of available "update speeds".
    >
    > I am talking about the same. For example, We can use MP-codes for small n0 & n1
    > and switch to Dirichlet estimator after gathering of enough statistics. All
    > things connected with nonstationarity must be applied to Dirichlet estimator
    > because at begining of coding We do not have statistics even for stationary source.

    But we do have it. Instead, its really hard to find a case where a single or
    really few counters would be used.
    Normally there're quite enough of counters in a context model so that we
    can collect our statistics. So, eg, you can test the model with different
    values of d and find the one that gives the smallest output. Of course its
    also possible to use d1,d2,... for different ranges and optimize them all.

    Btw, I kinda mix a static distribution into all my counters, guess it has
    the same effect. (I don't actually mix it, but keep already mixed value
    in the counter and update it accordingly).

    Though anyway, no pure mathematical solution would have any practical application.
    I mean, if you care about initial predictions of counters, you just have to use
    something like:

    t = Counter.Occurences
    if( t<Limit ) {
    Counter.P = p = table[Counter.History];
    (Counter.History<<=1)+=bit;
    } else {
    p = Counter.P;
    Counter.Update( bit );
    }

    and tune the values in the table[] to your data.
    Also you can simulate any theoretical formula with it, so its superior.

  19. #49
    Member
    Join Date
    Feb 2008
    Posts
    31
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Shelwien
    Whatever k there is, the number of binary strings of length n with k ones is
    C(n,k)=n!/k!/(n-k)!
    So given that we know the current k and n, therere C(n+1,k)/(n+1-k) different
    strings of length n+1 with zero at last position, and C(n+1,k+1)/(k+1) strings
    with one at last position, so we can calculate a relative frequency aka
    probability.
    C(n,k) - number of strings with the given n & k. I agree.
    These strings are equiprobable and You can simply count them to get probability estimation. I agree again (excluding that estimation will be other than You wrote).
    But after concatenation of old string with 0 & 1, strings will not become equiprobable because 0 & 1 are not equiprobable and We can not simply count them. We disagree here.
    Quote Originally Posted by Shelwien
    Instead, its really hard to find a case where a single or really few counters would be used.
    Yes, it is trouble. It is fully useless for PPM, but I hoped it can get some usage for weightning of predictions.
    Quote Originally Posted by Shelwien
    Also you can simulate any theoretical formula with it, so its superior.
    You can generate any text using million of monkeys with typewriters. Is it superior way for writing texts?

  20. #50
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,915
    Thanks
    291
    Thanked 1,273 Times in 720 Posts
    > But after concatenation of old string with '0' & '1', strings will not become
    > equiprobable because '0' & '1' are not equiprobable and We can not simply count
    > them. We disagree here.

    Look, its totally similar to what you did with likelihoods, but likelihood is
    an approximation, and the real P(s) is 1/C(n,k) because there're C(n,k) possible
    strings.

    >> Instead, its really hard to find a case where a single or really few counters
    >> would be used.
    >
    > Yes, it is trouble. It is fully useless for PPM, but I hoped it can get some
    > usage for weightning of predictions.

    1. Even if you don't have multiple similar counters within a model, its still
    possible to collect necessary statistics with different datasets.

    2. Have you seen that thread?
    http://www.encode.su/forums/index.ph...ic=586#msg7087

    >> Also you can simulate any theoretical formula with it, so its superior.
    >
    > You can generate any text using million of monkeys with typewriters.
    > Is it superior way for writing texts?

    Guess it could be, if there was a known measure of text quality.
    Luckily, we do have such a measure for compression quality.

  21. #51
    Member
    Join Date
    Feb 2008
    Posts
    31
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Shelwien
    Look, its totally similar to what you did with likelihoods, but likelihood is
    an approximation
    Why?
    Quote Originally Posted by Shelwien
    and the real P(s) is 1/C(n,k) because therere C(n,k) possible
    strings.
    Grrrr... It is true only if You select one string out of set of strings with equal n & k, but there can be other k...

  22. #52
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,915
    Thanks
    291
    Thanked 1,273 Times in 720 Posts
    >> Look, its totally similar to what you did with likelihoods, but likelihood is
    >> an approximation
    >
    > Why?

    Its P(n,k) = O(n) * (k/n)^k * ((n-k)/n)^(n-k) , right?

    1. There's no corresponding 1/P(n,k) different cases, so its obviously
    approximation, as there's no underlying "physical objects".

    2. By applying Stirling's approximation to C(n,k), you'd get

    n! ~= Sqrt(2*Pi*n)*n^n/Exp(n) = C*n^(n+.5)/Exp(n)
    C(n,k) = n!/k!/(n-k)!
    = C*n^(n+.5)/Exp(n) / C*k^(k+.5)/Exp(k) / C*(n-k)^(n-k+.5)/Exp(n-k)
    = n^(n+.5) / k^(k+.5) / C*(n-k)^(n-k+.5) =
    = Sqrt(n/k/(n-k))/C * n^n / k^k / (n-k)^(n-k) =
    = Sqrt(n/k/(n-k))/C * (n/k)^k * (n/(n-k))^(n-k)
    and 1/C(n,k) = C*Sqrt(k*(n-k)/n) * (k/n)^k * ((n-k)/n)^(n-k) =
    = O(n) * (k/n)^k * ((n-k)/n)^(n-k) = likelihood

    That's why I said that if you get something by approximation, then
    its approximation.

    >> and the real P(s) is 1/C(n,k) because there're C(n,k) possible
    >> strings.
    >
    > Grrrr...

    That's my line

    > It is true only if You select one string out of set of strings with
    > equal n & k, but there can be other k...

    Yeah, and we're talking about probability estimation based only on
    distributions, not on contexts. So if current state is (n,k) then
    we can have (n+1,k) or (n+1,k+1) on next step, C(n+1,k)+C(n+1,k+1)
    combinations in total as they don't overlap.

    Edit: Guess, I better add that to get a probability we have to count
    the combinations ending with 1 - that'd be C(n,k-1)+C(n,k)=C(n+1,k)

    So p1 = C(n+1,k) / (C(n+1,k)+C(n+1,k+1)) = (k+1)/(n+2)

  23. #53
    Member
    Join Date
    Feb 2008
    Posts
    31
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Shelwien
    1. Theres no corresponding 1/P(n,k) different cases, so its obviously
    approximation, as theres no underlying "physical objects".
    There is 2^n different cases because source can generate any string with length n and our selection p=k/n maximizes probability of our string s.
    Quote Originally Posted by Shelwien
    Yeah, and were talking about probability estimation based only on
    distributions, not on contexts. So if current state is (n,k) then
    we can have (n+1,k) or (n+1,k+1) on next step, C(n+1,k)+C(n+1,k+1)
    combinations in total as they dont overlap.
    Grrr.... even GRRRRR.... these cases do not overlap, but they have different probability and You can not simply count them.

    ES> So p1 = C(n+1,k) / (C(n+1,k)+C(n+1,k+1)) = (k+1)/(n+2)

    Great result!

  24. #54
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,915
    Thanks
    291
    Thanked 1,273 Times in 720 Posts
    > > 1. There's no corresponding 1/P(n,k) different cases, so its obviously
    > > approximation, as there's no underlying "physical objects".
    >
    > There is 2^n different cases because source can generate
    > any string with length n and our selection p=k/n maximizes
    > probability of our string s.

    Well, ok, thats your source so it can do what you want.
    But then at least explain what k!=0 means eg for the string
    of all zeros which is one of 2^n ?

    > > Yeah, and we're talking about probability estimation based only on
    > > distributions, not on contexts. So if current state is (n,k) then
    > > we can have (n+1,k) or (n+1,k+1) on next step, C(n+1,k)+C(n+1,k+1)
    > > combinations in total as they don't overlap.
    >
    > Grrr.... even GRRRRR....

    Please don't be hasty and read my previous post as many times
    as you'd need to understand it. I really paid attention when
    writing it, so am pretty sure that there's no mistakes.

    > these cases do not overlap, but
    > they have different probability and You can not simply count
    > them.

    Why do they have different probabilities?
    We're supposedly only know that there're k ones in the current
    string of length n, and nothing more specific. Also we don't
    know the probability of next bit. So we can enumerate
    all the possible cases on the next step, then select the
    cases ending with 1, and get our probability by dividing these
    numbers.

    > > So p1 = C(n+1,k) / (C(n+1,k)+C(n+1,k+1)) = (k+1)/(n+2)
    >
    > Great result!

    1. I already posted it a few times in this thread

    2. Your formula with k^k etc can be derived from it by
    replacing factorials with a Stirling approximation.

  25. #55
    Member
    Join Date
    Feb 2008
    Posts
    31
    Thanks
    0
    Thanked 0 Times in 0 Posts
    It becomes tiresome.
    "Мамой клианус, да!"
    Code:
     
    enum { I=1000, N=10 }; 
    int main() 
    { 
        int i, k, n, bit; 
        double p, IdealCodeLen, CodeLen[3][N][N], Summ[3], Err[3], ttt[N+1]; 
        for (ttt[i=0]=1.0;i < N;i++) 
                ttt[i]=(ttt[i+1]=pow(i+1.0,i+1.0))/ttt[i]; 
        for (n=0;n < N;n++) 
            for (k=0;k <= n;k++) { 
                CodeLen[0][n][k]=-log(ttt[k]/(ttt[k]+ttt[n-k]))/log(2.0); 
                CodeLen[1][n][k]=-log((k+0.5)/(n+1.0))/log(2.0); 
                CodeLen[2][n][k]=-log((k+1.0)/(n+2.0))/log(2.0); 
            } 
        for (Err[0]=Err[1]=Err[2]=i=0;i < I;i++) { 
            p=FRAND(); 
            for (IdealCodeLen=Summ[0]=Summ[1]=Summ[2]=k=n=0;n < N;n++, k += bit) { 
                bit=(FRAND() <= p); 
                Summ[0] += (bit)?(CodeLen[0][n][k]):(CodeLen[0][n][n-k]); 
                Summ[1] += (bit)?(CodeLen[1][n][k]):(CodeLen[1][n][n-k]); 
                Summ[2] += (bit)?(CodeLen[2][n][k]):(CodeLen[2][n][n-k]); 
                IdealCodeLen += -((bit)?(log(p)):(log(1.0-p)))/log(2.0); 
            } 
            Err[0] += fabs((Summ[0]-IdealCodeLen)/IdealCodeLen/N); 
            Err[1] += fabs((Summ[1]-IdealCodeLen)/IdealCodeLen/N); 
            Err[2] += fabs((Summ[2]-IdealCodeLen)/IdealCodeLen/N); 
        } 
        printf("Average relative error for        MP-estimator: %8.6f
    ",Err[0]/I); 
        printf("Average relative error for Dirichlet estimator: %8.6f
    ",Err[1]/I); 
        printf("Average relative error for    Laplas estimator: %8.6f
    ",Err[2]/I); 
        return 0; 
    }
    Program output:
    Code:
     
    Average relative error for        MP-estimator: 0.109592 
    Average relative error for Dirichlet estimator: 0.124685 
    Average relative error for    Laplas estimator: 0.168420

  26. #56
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,915
    Thanks
    291
    Thanked 1,273 Times in 720 Posts
    > It becomes tiresome.

    Well, that certainly explains things.
    I really meant a different data source/model.
    The only question is, if there any applications where source really generates data using fixed probabilities.

  27. #57
    Member
    Join Date
    Feb 2008
    Posts
    31
    Thanks
    0
    Thanked 0 Times in 0 Posts
    The program contains a bug: there is no need to divide on N.
    Output of corrected program for N=10:
    Code:
    Average relative error for        MP-estimator: 1.095918 
    Average relative error for Dirichlet estimator: 1.246853 
    Average relative error for    Laplas estimator: 1.684201
    Output of corrected program for N=100:
    Code:
    Average relative error for        MP-estimator: 0.112389 
    Average relative error for Dirichlet estimator: 0.126469 
    Average relative error for    Laplas estimator: 0.180794
    Thus, We can conclude that N=10 is too small string length for convergence.
    The only question is, if there any applications where source really generates data using fixed probabilities.
    As a rule, We have not enough statistics even for estimation of stationary sources, for nonstationary ones size of typycal data must be much larger.

  28. #58
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,915
    Thanks
    291
    Thanked 1,273 Times in 720 Posts
    Well, I'd rewritten your program to show the accumulated code lengths instead of some random averages.

    http://shelwien.googlepages.com/mp.rar

    And my result somehow is like this:
    Code:
     
    compressed bitlength of 1000 random binary strings of length 100 
                  Ideal: 71945.546875 
           MP-estimator: 75128.820313 
    Dirichlet estimator: 74939.500000 
       Laplas estimator: 74749.976563

  29. #59
    Member
    Join Date
    Feb 2008
    Posts
    31
    Thanks
    0
    Thanked 0 Times in 0 Posts
    And what?
    More rough methods give better result for nearly equiprobable sources (p ~ 0.5) and your accumulated summs consist mainly of encoded output of these sources because, for p ~ 0.0 or p ~ 1.0, encoded strings are too short to give visible impact on your accumulated summs.

  30. #60
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,915
    Thanks
    291
    Thanked 1,273 Times in 720 Posts
    http://shelwien.googlepages.com/mp.cpp

    Ok, here I added a couple of lines:
    p = FRAND();
    float r = 0.25;
    p = (p<0.5) ? p*r : 1-(1-p)*r;

    Code:
    compressed bitlength of 1000 random binary strings of length 100 
                  Ideal: 31728.228516 
           MP-estimator: 34714.800781 
    Dirichlet estimator: 34734.050781 
       Laplas estimator: 35370.746094 
    (k+0.40)/(n+0.80) estimator: 34685.050781
    Though actually, thats already additional information.
    I mean, that probability range is limited.

Page 2 of 3 FirstFirst 123 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •