# Thread: Relation between n-ary coding for n = 1 and n > 1

1. ## Relation between n-ary coding for n = 1 and n > 1

Suppose we're doing unary coding. If alphabet size is |Sigma| > 1 then the minimum number of flags to code a symbol is 1 and maximum is |Sigma| - 1. All symbols, except the highest, need the same number of flags to encode as their rank. Last symbol needs one flag less, because if we exclude |Sigma| - 1 symbols then we know the remaining symbol.

Question: is it possible to assign such probabilities to flags in unary coding so that effective compression ratio will be equal to a previously provided bytewise or bitwise compression scheme? If it is possible then how to adaptively compute the flag probabilites?

A weaker question, regarding incompressible data: is it possible to assign such probabilities to flags so incompressible data won't be expanded?  Reply With Quote

2. We have A = {1, 2, ..., n}, n>2. We have a fixed probability distribution P on A.

1. Unary decomposition.
dec(a) = b_1 b_2 ... b_a = 0^(a-1) 1 = 00...01 (0 appears a-1 times, 1 appears once)

2. You want a distribution Q on 0^k 1, k = 0, 1, ..., n-1, such that: P(a) = Q(dec(a)).

P(1) = Q(1),
P(2) = Q(01) = Q(0) Q(1|0) = (1-Q(1)) Q(1|0) <=> Q(1|0) = P(2)/(1-Q(1)),
P(3) = Q(001) = (1-Q(1)) (1-Q(1|0)) Q(1|00) <=> Q(1|00) = P(3)/[(1-Q(1)) (1-Q(1|0))]

...

Q(a) = P(a)/ prod_i=0..a-1 (1-Q(1|0^i))

Cheers

EDIT2: As you see i just wondered, since i used n as the alphabet size above. But the terminology above is misleading, "unary" coding is a scheme to map letters from an alphabet A, |A|>2 to words over a binary alphabet, repeat "bin-ary" or "2-ary". It is somehow weired to use an abbreviation such as "n-ary" coding to refer to unary coding, when n=1.  Reply With Quote

3. toffer, unary coding for N is N ones followed by zero bit  Reply With Quote

4. Thanks toffer! I've somewhat abused the terminology, but you've finally got the point.

So it seems that unary compression can reach the same compression ratio as binary or bytewise one. I'll check it out   Reply With Quote

5. As toffer said, but sadly its only practical with freq-based models.
With freqs you just need to subtract freqs of skipped symbols from the total:
Code:
```total=N0+...+Nm;
p=N0/total; bit=decode(p); if( bit ) return;
p=N1/(total-=N0); bit=decode(p); if( bit ) return;
p=N2/(total-=N1); bit=decode(p); if( bit ) return;
[...]```  Reply With Quote

6. Why sadly and why it's only practical with freq-based models?

Anybody has an idea how to adaptively adjust Q predictions after coding? Assuming we coded symbol with value k then only Q(1|0^0), Q(1|0^1), ..., Q(1|0^k) should be adjusted.

Adjustment should for example be analogous to adjusting P(k) by (1 - P(k)) * factor, where factor is some small (much lower than 1) positive value, but the post-adjustment prediction can be just shown as P'(k) for simplicity.

BTW:
http://www.toffer86.tk/ points me to some spam site.  Reply With Quote

Because probability counters, like fsm, are more efficient.
But ppmd-like unary coders using good counters are not efficient at all.
So we got stuck with hashtable-based bitwise models.

> and why it's only practical with freq-based models?

Because with freqs it works automatically, while with probabilities
there're precision issues, especially on updates.

> Anybody has an idea how to adaptively adjust Q predictions after
> coding? Assuming we coded symbol with value k then only Q(1|0^0),
> Q(1|0^1), ..., Q(1|0^k) should be adjusted.

If you're trying to simulate update from a bytewise model, then it'd
be necessary to compute the actual symbol's probability (by multiplying
all the probs in the chain), update that, then divide by the prefix
probability.

But actually its quite ok (or even better) if you'd just update
the probabilities of flags in unary code independently, like
usual linear counters or something.

> Adjustment should for example be analogous to adjusting P(k) by (1 - P(k)) * factor

I suppose http://ctxmodel.net/files/newCM_v0.rar /o0alf/sh_node256.inc  Reply With Quote

8. I've fixed this.  Reply With Quote

9. But actually its quite ok (or even better) if you'd just update
the probabilities of flags in unary code independently, like
usual linear counters or something.
So, assuming long enough stationary data, unary coding with an independent linear counter per flag should achieve the same compression ratio as bytewise coding with linear counting?  Reply With Quote

10. > So, assuming long enough stationary data, unary coding with an
> independent linear counter per flag should achieve the same
> compression ratio as bytewise coding with linear counting?

Yes, with dynamic ranking and counter update parameters tuned for each rank
its quite likely - in fact it should be better for binary data,
because updating multiple counters per symbol provides faster adaptation.

But its impractical.
Dynamic ranking is a must, because otherwise you'd have to encode 254 bits
for each 0xFF byte.
And dynamic ranking means that we'd have to recompute the probabilities:
Code:
```rank1p=p1; // symbol!='a' flag
rank2p=p2; // symbol!='b' flag

// ranks of 'a' and 'b' are swapped:

rank1p=p2*(1-p1);        // symbol!='b' flag
rank2p=p1/(1-p2*(1-p1)); // symbol!='a' flag```
so we have to store probabilities as doubles for good precision,
or we lose compression if precision is bad,
and either way multiple extra divisions and multiplications per symbol are bad.  Reply With Quote

11. But what if we don't recompute probabilites on symbols swapping? Ie if we emulate a MTF transform? On BWT output that shouldn't do bad and probably there are other scenarios where MTF ranks could be good, I'm thinking of high (unbounded) order PPM. And instead of direct probabilites in model, we can have some level of indirection.

In unbounded order PPM frequencies are usually very low, thus rather inaccurate. I hope MTF ranks and MTF ranks history (ie history of MTF ranks of matched symbols) used as a context to probability table would alleviate that.

At least in my tests simplified Information Inheritance in PPMdII doesn't work well with high orders, ie > 20 or so, of course if there are lots of such long contexts in input file.  Reply With Quote

12. You mean keep a fixed table of counters per rank and a dynamic symbol table which you swap around?
Well, look at Symbol Ranking results.
It won't be any good even for BWT data (because MTF is inefficient there and orders higher than 2 are useless).
Also note that ppmonstr's compression is comparable with paq even with these inaccurate freqs.  Reply With Quote

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•