# Thread: Counting 1 or 0?

1. I have notice that some programmers use "zero" for probability calculation (i.e. Igor Pavlov, Shelwien and me) and the others use "one" for probability calculation (i.e. Matt Mahoney). At my point of view it's only a decision. What about the underhood facts? Which solution is offered by information theory? Which solution do you use? Why? 2. Here's my opinion:

The coding cost ld(1/p) of a probability p approaching 0 approaches infinity, so coding such an event requires an infinite number of bits (ignoring that a division by zero isn't defined).

But how "worse" is infinity? In practise, the worst output, which can happen is a complete flush of your arithmetic coding registers, if they become equal (hi=lo or range=0) (when using arith. coding). Normally implementations use 32 bit registers, so infinity is actually 32 bits.

Maybe there's something else reasoned by rounding, symmetry in arith. code space, i never bothered ... 3. Thanks! But, my question is about modeling the data. Not coding itself. I mean if I have an entropy coder which is suitable for P(0) and I want to use P(1) instead, I will implement my model based on P(1) and I will implement my coder for encoding (1-Symbol). In this explanation assume that Symbol is a bit which is actually coded.

On the other hand, we can expand my question based on your answer. Also, what about entropy coder side? 4. You are talking about binary arithmetic coding (or modelling data bit by bit), right? In this case the above applies too.

You have to concider that your model should output "stuff", here probabilities, which your coder can handle. So when talking about modelling, this is tied with your entropy coder; this is the motivation to answer your question in the sense of coding.

With arithmetic coding this refers to avoid overflows of: hi<lo. This might happen if your P(1)-P(0) >= S, where [0...1) -> [0...S-1). So when modelling P(1) and coding P(0), make sure the above can't happen.

But again, this is implementation specific....

Maybe some of the experts here can give you a better answer.

BTW: my CM coders always code an EOF flag (either 0 or 1) preceeding the next symbol with a probability of EOF=1 set to zero (which causes an EOF to flush the arithmetic state, outputting 32 bits). Previousely i coded and modeled P(0), now i use P(1), since you can implement bit models in a simpler manner. 5. Yes, I'm talking about modelling data bit by bit. And if I write my question in the other words, this can be like that:

In bit models (data modelled bit by bit), which is the suitable way for coding data: counting 1 or 0? Don't care entropy coding phase. Just focus on modelling. 6. Do you mean, which symbol you should model? Modelling S0 (the zero bit) or S1 (the one bit). You confused me - i thought you mean to allow a probability P(Sx) to have the value zero.

In this case: Simple answer: it doesn't matter.

If you use bitmodels, which are updated proportional to the coding error 1-P(Sx):

P(Sx) += (1-P(Sx)) / R

You can gain a *tiny* amount of speed when modelling S1:

P(S1) += (y-P(S1)) / R

where y is the bit. If you used models based on S0:

P(S0) += ((~y) - P(S0)) / R

You see - one more instruction, the ~ "not", which inverts y=0 to y=1.

When you use table based updates, it doesn't matter. 7. So sorry for confusing you I've been talking about exactly which symbol we should model. Sorry again. Thanks for replies 8. Originally Posted by toffer
exactly.

if sum of probabilities of 0 and 1 is constant (as in fpaq0p) then you dont model probability of 0 or probability of 1 but you model ratio of probabilities of 0 and 1.

if sum of probabilites of 0 and 1 is not constant (as in fpaq0) then you must model two probabilities - of 0 and of 1. 9. It doesn't really matter for which value you store/update the probability as P1=1-P0.
Though optimal P1-based and P0-based implementations of the same models would
have slightly different formulas for updates and such - mostly due to rounding.
Also there're no real speed- or source simplicity-based preferences, though Matt's
counters and such look more complex when directly converted to P0 version.

So I think that Matt's choice was based on a "common sense" association of
0 with zero increment and 1 with non-zero.

As for me (and whole compression.ru group, I guess), there weren't any conscious
choice. Its just that for some time the "main stream" was models with byte alphabet
and P0..P255 distrubutions updated all at once (its still quite useful btw, especially
it you want a fast compressor). In which case its obvious that one should increase
the probability of encounted byte and decrease others. And then there just
wasn't any sense in changing anything after switching from alphabet size 256 to 2
(for unary coding etc), as the counters etc worked good without any change.
Then, its also a habit already, so P0-based model is easier to understand for me. #### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•