Results 1 to 25 of 25

Thread: Modelling... what exactly?

  1. #1
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,267
    Thanks
    200
    Thanked 985 Times in 511 Posts

    Modelling... what exactly?

    For statistical compression, it is most common to use the probability
    distribution which minimizes the codelength of already known data for coding
    of next symbol.

    More advanced implementations may actually take into account the next
    symbol too, and minimize the codelength of already known data + next
    symbol.

    But either way, it seems to be common sense to think that the model in
    compression predicts the next symbol, ie computes the probability
    distribution of the next symbol, which is then immediately used for coding.

    Now, is that actually correct?

    Why, actually we have a perfect example of arithmetic coding with
    completely precise modelling and coding - enumeration for a memoryless
    source with known symbol counts.

    It can be implemented in a way which would look exactly like a standard
    statistical coder with model + AC, but it would still assign codes of
    precisely the same length to all data permutations, without any redundancy
    at all.

    The actual coding is the same, so the model is what makes it so perfect.

    Now, what's so different about the model in this case?

    Well, it outputs the numbers of symbol's occurrences until the end of input,
    which lets us enumerate the future strings and precisely subdivide the code
    space for strings starting with each symbol.

    In short, in the example where the model is known to be correct and
    precise, that model actually predicts the numbers of symbol's occurrences
    until the end of input, instead of chances that next symbol would be a
    specific one.

    So, why don't we try doing the same in a "universal" model?

    Actually, knowing the number of remaining symbols in the data is quite
    common (mainly because of performance considerations - checking for EOF is
    much slower than storing the length and counting symbols).

    Suppose that we have p=P(bit==0) and n bits left.
    The probability of next n bits containing k zeroes is Binomial[n,k]*p^k*(1-p)^(n-k)
    (its a sum of likelihoods of strings with k zeroes divided by sum of
    likelihoods of all strings, which is 1).

    We have to choose the best k value, so it has to be the most probable one.
    Then let's differentiate it by k, and find the root. Its a little
    complicated, but in the end Mathematica shows it as
    Log[(1-p)/p] = HarmonicNumber[n-k]-HarmonicNumber[k] // HarmonicNumber[n] = Sum[1/i,{i,1,n}]

    Now, let's compute the actual estimations: http://nishi.dreamhosters.com/u/target1.png
    f[p0_,n0_]:=k/.FindRoot[(HarmonicNumber[k]-HarmonicNumber[-k+n]+Log[(1-p)/p])/.p->p0/.n->n0,{k,p0*n0}]
    Plot[f[p,100]/100-p,{p,10^-3,1-(10^-3)}]



    f[10/100,100] = 9.59634 // optimal k for n=100,p=10/100
    f[90/100,100] = 90.4037 // optimal k for n=100,p=90/100

    And as we can see, its somewhat different from the probability.

    Am I doing something wrong?

  2. #2
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    555
    Thanks
    208
    Thanked 193 Times in 90 Posts
    This is something I often thought about when looking at PAQ and possibilities to improve it. What kept me off from it was the reasoning behind that graph - it's very good to see an actual visual and mathematical representation for it instead of just skimming through it. I don't think you've done something wrong here.

    What the graph shows is that "there will be exactly 49 zeroes in the next 100 bytes" is useful, but it won't really help/change prediction or compression much because the bit distribution in the codespace left is only skewed a bit. But as you get very close to the extremes (near 0 or 100 zeroes), the skew gets stronger and it will actually help by sorting out predictions that aren't possible anymore because "you don't have enough 0 or 1 bits left".

    I never really tried to implement something like this because I thought it would end up in another typical tradeoff - you'll have to store the symbol counts in some way, too, and the cases where you actually compress better will likely be edge cases. Also, the graphes for larger n will likely approach a complete straight line, so you'll have to count symbols for small blocks instead of the whole file to get better results. On the other hand, in programs like PAQ, symbol counts might have a much stronger effect for individual models as they might have only a few of 10000 cases left that contain exactly 49 zeroes (PAQ might be a bad example as its models usually predict one bit instead of the next 100, but I think you get the idea).
    Last edited by schnaader; 30th April 2012 at 10:28.
    http://schnaader.info
    Damn kids. They're all alike.

  3. #3
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    278
    Thanks
    6
    Thanked 29 Times in 18 Posts
    The Binomial Distribution assumes a sampling with replacement. Try a Hypergeometric Distribution instead.

  4. #4
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,267
    Thanks
    200
    Thanked 985 Times in 511 Posts
    Binomial[n,k] is Mathematica syntax for n!/k!/(n-k)!
    and I used it to compute the probability of a bit string of length n with k zeroes, given p=P(bit==0).

    Also the topic is the meaning of rangecoder's inputs and whether the probability of getting p*n zeroes
    among next n bits is the same as the probability p of single next bit being zero.

  5. #5
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    278
    Thanks
    6
    Thanked 29 Times in 18 Posts
    Binomial[n,k]*p^k*(1-p)^(n-k)

    is the Binomial Distribution and it assumes that you draw k values out of n values but put them back, which is obviously not what you are doing.

  6. #6
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,267
    Thanks
    200
    Thanked 985 Times in 511 Posts
    Ah. But I'm not sampling Binomial[n,k] strings, I'm computing a probability of a single sampled string being one of these.

  7. #7
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    278
    Thanks
    6
    Thanked 29 Times in 18 Posts
    but you're sampling Binomial[n,k] bits out of a string of length n

    Suppose you have a string of 100 bits where exactly 10 bits are "0".

    your model (as I understand it):
    pick a bit, look at it, put it back again, do this 100 times
    the probability of 10 bits being "0" is: Binomial[100,10] * 0.1^(10)*(0.9)^90=~0.13

    hypergeometric model:
    pick a bit, put it away, do this 100 times
    the probability of 10 bits being "0" is: Binomial[10,10]*Binomial[90,0]/Binomial[100, 100]=1

    sure, the difference between the two models gets smaller as the total length increases. For example if you sample this 100 Bits out of 1000 Bits the Hypergeometric Model gets also around ~0.13.

  8. #8
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,267
    Thanks
    200
    Thanked 985 Times in 511 Posts
    http://en.wikipedia.org/wiki/Combination
    There're 2^n strings total, and Binomial[n,k] of them have k zeroes.

    Also, there're Binomial[n-1,k-1] strings which start with a zero,
    and Binomial[n-1,k] strings which start with a one, so we can
    implement a coder like this:
    Code:
    range = Binomial[n,k];
    low = 0; n0=0;
    for( i=0; i<n; i++ ) {
      low += (bit[i]==1) ? range*(k-n0)/(n-i) : 0;
      range *= (bit[i]==0) ? (k-n0) : (n-i)-(k-n0)
      range /= (n-i);
      n0 += (bit[i]==0);
    }
    which would directly convert a (n;k)-string to its number, without any redundancy.

    As you can see, this is basically a simplified rangecoder without renormalization.
    And I'm wondering whether we're right to use a bit's probability estimation for coding,
    while actually it has to be the number of remaining bits.

    Example: suppose the bit freqs are stored before the string.
    Then, at some point we would know that there're no more zeroes,
    while an adaptive model would still predict zeroes with a high probability.
    How do we determine a point where decreasing counts become more
    important than an extra order0 model which can be mixed in?

  9. #9
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    278
    Thanks
    6
    Thanked 29 Times in 18 Posts
    It doesn't matter how you construct the probability, and i think I understand it good enough. Combinatorics are somewhat restricted if it comes to priors and insight on the "stochastic process" behind the combinatorical modelling.

    I think you wonder why 0.13 != 0.10 or the other way round with the k-value.
    Last edited by Sebastian; 2nd May 2012 at 00:31.

  10. #10
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,267
    Thanks
    200
    Thanked 985 Times in 511 Posts
    Ok, there's an unknown bitstring of length n.
    We know that a probability of each bit in that string being zero is p.
    What's the probability of that string having k zeroes?

    I'm talking about the fact, that, given a bit probability estimation p,
    we can go further and compute the probability of encountering x=0..n zeroes
    within remaining n bits of data, then feed k/n to rangecoder, where
    k is the most likely number of zeroes.

  11. #11
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    278
    Thanks
    6
    Thanked 29 Times in 18 Posts
    Quote Originally Posted by Shelwien View Post
    And I'm wondering whether we're right to use a bit's probability estimation for coding,
    while actually it has to be the number of remaining bits.
    The probability estimation is done via bernoulli distribution, successive bernoulli-trials form a binomial distribution so this is basically the same thing.
    Knowing the total number of bits makes no difference to the classical method (because you draw with replacement).

    Example: suppose the bit freqs are stored before the string.
    Then, at some point we would know that there're no more zeroes,
    while an adaptive model would still predict zeroes with a high probability.
    How do we determine a point where decreasing counts become more
    important than an extra order0 model which can be mixed in?
    I consider knowing the only total of 0 and 1 useless. There should be freqs for many contexts. You could then mix the semi-static model with the adaptive one.

    Quote Originally Posted by Shelwien View Post
    Ok, there's an unknown bitstring of length n.
    We know that a probability of each bit in that string being zero is p.
    What's the probability of that string having k zeroes?
    It depends on how you look at that string - either binomial or hypergeometric.
    Hypergeometric: the probability of a 100 bits string with p=0.1 having 10 Zeros is 1!
    Binomial: the probability of a 100 bits string with p=0.1 having 10 Zeros is ~0.13

    I'm talking about the fact, that, given a bit probability estimation p,
    we can go further and compute the probability of encountering x=0..n zeroes
    within remaining n bits of data, then feed k/n to rangecoder, where
    k is the most likely number of zeroes.
    Yes I understand, but you're doing nothing different than probability estimation with bernoulli distribution.
    Last edited by Sebastian; 2nd May 2012 at 01:18.

  12. #12
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,267
    Thanks
    200
    Thanked 985 Times in 511 Posts
    > Knowing the total number of bits makes no difference to the classical method

    It does, because P(bit==0)=0.99 can't be correct when there're only 10 remaining bits.
    The right probability can be 0.9 or 1, but not 0.99.

    And its possible to encode lengths for relatively small blocks without any noticeable redundancy,
    so the difference is not so theoretical.

    > I consider knowing the only total of 0 and 1 useless.

    Its not useless if there're still 100 bits to decode and p0=0.99, but zero counter already expired.

  13. #13
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    863
    Thanks
    458
    Thanked 257 Times in 105 Posts
    Quick question : where does p comes from ?
    Is that a prior pass, with statistics, or a guesstimation using any kind of modeler ?

    Also, knowing n is trivial for an order-0 coder, but what about order-n ?
    How to know n for each context ?

  14. #14
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,267
    Thanks
    200
    Thanked 985 Times in 511 Posts
    > Quick question : where does p comes from ?
    > Is that a prior pass, with statistics, or a guesstimation using any kind of modeler ?

    The usual bit probability estimation from a usual model.
    I'm discussing whether we actually need a different kind of input
    for arithmetic coder, which is only approximately similar to that probability.

    So I'm building a secondary model for that "different kind of input", based
    on known bit probability estimation, but obviously its not the only way
    to make such a model, and hardly the most precise one.

    > Also, knowing n is trivial for an order-0 coder, but what about order-n ?
    > How to know n for each context ?

    For example, we can collect statistics on distances between context occurrences
    and extrapolate it to compute remaining occurrences (knowing the total file size).

    Anyway, it could get fairly complicated... let's find out first whether
    it makes any sense for order0 :)

  15. #15
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    278
    Thanks
    6
    Thanked 29 Times in 18 Posts
    Quote Originally Posted by Shelwien View Post
    > Knowing the total number of bits makes no difference to the classical method
    It does, because P(bit==0)=0.99 can't be correct when there're only 10 remaining bits.
    The right probability can be 0.9 or 1, but not 0.99.
    As I wrote, it makes no difference with your method. You must use the hypergeometric model instead.


    Its not useless if there're still 100 bits to decode and p0=0.99, but zero counter already expired.
    Maybe for very small blocks.

  16. #16
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,267
    Thanks
    200
    Thanked 985 Times in 511 Posts
    > You must use the hypergeometric model instead.

    As I said, I'm not sampling the strings from a pool, there's only one string,
    so the distribution is binomial.
    I'm talking about the probability of next bit being zero vs number of zeroes among remaining bits.

    > Maybe for very small blocks.

    As I said, any data can be split into small blocks - in fact many popular compressors like rar and zip
    do just that.

  17. #17
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    Consider a string x of length n bits with k bits equal to 0. Then p(x) = p(n) p(k|n) p(x|k,n). If the bits of x are i.i.d. then p(x|k,n) is just the binomial model described above. But you still have to model n and k. So given an ideal range coder, I don't see where there would be any coding savings.

  18. #18
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,267
    Thanks
    200
    Thanked 985 Times in 511 Posts
    > But you still have to model n and k

    It can be assumed that we know n (filesize can be stored separately),
    and we can even make it reasonably small.
    And as to modelling k - yes, that's exactly what I'm suggesting here.
    I tried to compute the maximum-likelihood estimation for k, and k/n seems to
    be slightly different from p. However it looks like a linear mix with 0.5, except on edges,
    and such a mixing with a tuned weight would commonly improve the compression,
    so its a little hard to test my theory, as we won't know whether the compression
    improvement comes from our misinterpretation of rc inputs or simply from the noise in the data.

  19. #19
    Member
    Join Date
    Apr 2010
    Location
    El Salvador
    Posts
    43
    Thanks
    0
    Thanked 1 Time in 1 Post
    The probability model is naturally for an infinite string, you could consider the joint set of all used input-blocks (seperate files over lifetime) as an "infinite" string. That we consider throwing away the statistics of the previously coded files has practical reasons. The result is some sort of block-step non-stationary modeling, from the PoV of the model. Like, everytime you move to another place, you force yourself to forget your past. But this is an imposed behaviour, for practical reasons, not because it's a property of the model.

    If you wouldn't reset the model, then the previous file would pay the (bit-)price the next file would not. And nothing gets lost.

    Your combinatorical model works only on finite strings, if you'd use a very long finite string you'd see the two models converge.

    Interestingly, for long strings the overhead of n & k becomes neglible, so does the combinatorical approach gain. For short strings n & k become much larger (vs. string length), so does the combinatorical approach gain. Looks like you're just re-distributing bits.
    Last edited by Ethatron; 3rd May 2012 at 18:09.

  20. #20
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    278
    Thanks
    6
    Thanked 29 Times in 18 Posts
    Quote Originally Posted by Matt Mahoney View Post
    If the bits of x are i.i.d. then p(x|k,n) is just the binomial model described above.
    The attempt of using the number of "0" and "1" bits to make better estimations makes it non i.i.d.

    You're right about the binomial model. But the binomial model is just the compound process of the bernoulli model (which every cm uses). Both ML-estimations for p are exactly the same.

    Quote Originally Posted by Shelwien
    I tried to compute the maximum-likelihood estimation for k, and k/n seems to
    be slightly different from p.
    I still don't get your problem, because i don't know which p you mean.
    if f[p,k,n] is the binomial model from above than it sounds like you "hope" that f[0.1,10,100]=0.10?
    It also could be that your estimator for k is biased but you can check that.

  21. #21
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,267
    Thanks
    200
    Thanked 985 Times in 511 Posts
    @Ethatron:

    > The probability model is naturally for an infinite string

    Both modelling precision and coding precision are limited,
    so its far from natural.

    > throwing away the statistics of the previously coded files has practical reasons

    I'm not talking about solid compression here, but its actually helpful.
    Frequently developers don't consider the nature of contexts when initializing the statistics,
    and then any actual stats appear better than initial ones.
    For example, this applies to lzma's masked literal model.

    > Your combinatorical model works only on finite strings, if you'd use
    > a very long finite string you'd see the two models converge.

    Yes, in theory. But with imprecise models/coders and nonstationary data
    its not so simple.

    > Interestingly, for long strings the overhead of n & k becomes neglible,
    > so does the combinatorical approach gain

    Its actually the opposite, in fact.
    Because of coding overhead, already at 1000 symbols explicit coding of
    data size becomes better than EOF in the alphabet (same with EOF flags).

    Anyway, this is actually all about "probability extrapolation".
    First there appeared a reason to apply sq(C*st(p)) transformation
    to model predictions (to convert a prior probability estimation to posterior),
    and now I found another potential candidate.

    It may be hard to provide theoretical foundations for various parameter
    values etc, when it involves various overheads of practical implementations,
    but its easy to see the improvements in practice.

    @Sebastian:

    > I still don't get your problem, because i don't know which p you mean.

    In this thread, I'm arguing that the actual inputs for the rangecoder
    are symbol counts, not the probability.
    Thus it may be possible that a model which would predict optimal symbol counts
    could do better than a (usual) model which predicts the probability distribution of
    the next bit.
    I'm only comparing p with k/n (if its what you meant in the quote) to show
    that with different models there would be different rangecoder inputs
    (thus compression would be also different).
    Also we don't have to predict n, because its easier to assume that its known -
    there's no coding overhead from that for practical reasons.

    In fact we could store k too - how to make use of that additional information,
    in conjunction with usual context model, is also an interesting question.
    Btw, one possible answer is to predict k using context model and mix it with side info.

    > if f[p,k,n] is the binomial model from above than it sounds like you "hope" that f[0.1,10,100]=0.10?

    f[p,n] is k, where k is a maximum likelihood parameter for binomial model, given p and n.

    > It also could be that your estimator for k is biased but you can check that.

    That's actually what I meant with "doing something wrong".
    Mathematica defines the harmonic numbers for fractional n (actually for complex n).
    So atm I'm not sure whether its ok to use fractional k's provided by f[p,n] or
    maybe its better to explicitly compare the likelihoods for various integer k's.

    Anyway, it seemed like an interesting topic to discuss

  22. #22
    Member
    Join Date
    Apr 2010
    Location
    El Salvador
    Posts
    43
    Thanks
    0
    Thanked 1 Time in 1 Post
    > > Interestingly, for long strings the overhead of n & k becomes neglible,
    > > so does the combinatorical approach gain

    > Its actually the opposite, in fact.

    I think what I said might be a a bit convoluted:
    Sending size+zeroes requires a few bits (sn&sk). That bits vs. the bits of a very long string is neglible: ilength/(olength+sn+sk) ~= ilength/olength. I meant the overhead of sending n&k, not any overhead in the coding utilizing n&k.

    Anyway. It may be interesting to look at this kind of file and look what happens:

    1111...11110000...0000

    The probability models starts and ends with p=0.5, your combinatoric model would stop producing anything at all when it reaches the first 0 - the last of any of either of the two bits in general. And in between?
    Last edited by Ethatron; 5th May 2012 at 03:12.

  23. #23
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,267
    Thanks
    200
    Thanked 985 Times in 511 Posts
    > I meant the overhead of sending n&k

    Its necessary to encode these values anyway, explicitly or implicitly - there're no infinite streams in practice.
    And then explicit coding is usually better, because overhead of various kinds of unary coding is too high.

    > The probability models starts and ends with p=0.5

    An adaptive model won't.

    > your combinatoric model would stop producing anything at all

    Not sure what you mean here. The binomial model from the first post is based on a usual model (which provides p).
    But if we'd explicitly store counts of zeroes and ones, then yeah, there'd be nothing to decode when any of the counts becomes zero.

    > And in between?

    Well, we'd have a probability estimation p=n0/(n0+n1) and update (bit ? n1-- : n0--).
    When n0 and n1 are both far from zero, it can be used like any other model - mixed with other estimations etc.

  24. #24
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts
    Quote Originally Posted by Ethatron View Post
    > > Interestingly, for long strings the overhead of n & k becomes neglible,
    > > so does the combinatorical approach gain

    > Its actually the opposite, in fact.

    I think what I said might be a a bit convoluted:
    Sending size+zeroes requires a few bits (sn&sk). That bits vs. the bits of a very long string is neglible: ilength/(olength+sn+sk) ~= ilength/olength. I meant the overhead of sending n&k, not any overhead in the coding utilizing n&k.

    Anyway. It may be interesting to look at this kind of file and look what happens:

    1111...11110000...0000

    The probability models starts and ends with p=0.5, your combinatoric model would stop producing anything at all when it reaches the first 0 - the last of any of either of the two bits in general. And in between?

    Actaully if you look at the work of Howard and Vitter when you do a simple adaptive arithmetic , OR if you store the number of
    ones and zeros and then use the exact probability for basis using the exact number of ones and zeroes remaining you get the
    same compression. Having all ones followed by all zeroes really makes no difference. The compression is the same regardless
    of the permutation of the ones and zeroes its a function of the actual number of ones and zeroes. And there many ways
    to do the same thing. Actually when you write a real bijective arithmetic compressor based on any of these methods you
    end of computing something like say 100.625 bits. What this really means is that the compression is not really to the
    same number but some will be 100 bits while others would be 101 bits you have to map the fraction of bits to 2 different
    values.

    When I wrote ARB255 the simple arithmetic compressor I actually compress a sting that is less than the number of bytes
    I suspect there maybe cases because of the endings I chose to use where if you look at some files and start compressing all permutations of the file you end up with 3 values say N N+1 and N+2 However I don't remember ever getting more than two
    values N and N+1 for various permutations of a file. ARB255 was for simple bijective adaptive compression of IID files of up to 256 symbols. I plan to update the code one of these days and post it here.

  25. #25
    Member
    Join Date
    Apr 2010
    Location
    El Salvador
    Posts
    43
    Thanks
    0
    Thanked 1 Time in 1 Post
    Quote Originally Posted by Shelwien View Post
    > The probability models starts and ends with p=0.5

    An adaptive model won't.
    It will. 1/2,2/3,3/4,4/5,5/6,6/7,7/8,7/9,7/10,7/11,7/12,7/13,7/14 counts/total for 1s, p=0.5 in the beginning and the end. My example is intentionally symmetric.

    Quote Originally Posted by Shelwien View Post
    > your combinatoric model would stop producing anything at all

    Not sure what you mean here. The binomial model from the first post is based on a usual model (which provides p).
    But if we'd explicitly store counts of zeroes and ones, then yeah, there'd be nothing to decode when any of the counts becomes zero.

    > And in between?

    Well, we'd have a probability estimation p=n0/(n0+n1) and update (bit ? n1-- : n0--).
    When n0 and n1 are both far from zero, it can be used like any other model - mixed with other estimations etc.
    7/14,6/13,5/12,4/11,3/10,2/9,1/8,0/6,0/5,0/4,0/3,0/2,0/1 counts/total for 1s then. The zero probability is not a problem as no 1s are coded anymore (no collapse-interval-to-0), and 0s are coded on the full interval [0,total,total] (low, high, total).

    I would say the last "1" pays the price the remaining "0"s don't. The more likely the event of last-symbol-of-it's-kind the worst the probability of that symbol becomes. But's a correct estimation, nothing to complain.
    Last edited by Ethatron; 6th May 2012 at 05:20.

Similar Threads

  1. Modelling techniques
    By Shelwien in forum Data Compression
    Replies: 14
    Last Post: 1st June 2008, 23:15

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •