# Thread: Minimal Ashford arithmetic-coder termination

1. ## Minimal Ashford arithmetic-coder termination

I could need some help solving a minimal-termination problem I have with the Ashford-coder. The coder is based on the CACM-coder but it has some underflow-handling which differs "significantly" (the 5Cent-coder is what you can reach with CACM). CACM-style coders can be terminated with at most 2 disambiguating bits:

Code:
```    /* Output two bits that select the quarter that
* the current code range contains.
*/
underflow += 1;
clearUnderflow(!(low < firstRange));```
The number of bits flushed are "1 + outstanding" [clearUnderflow] + "1" [+= 1].

In the Ashford-coder clearUnderflow is always "1 + (outstanding - 1)", and there are more differences in the renormalization (like minRange being 0x4001). The interval-decoding though is regular CACM-style:

Code:
`    target = (((((value - low) + 1) * total) - 1) / (range + 1))`
What I'd like to know in terms of interval-fit, which is the smallest possible number of bits to be put out in the CACM-case.? The DCC-coder has an extreme intuitive algorithm for determining the number of bits:

Code:
```    /* output the nbits integer bits (1, 2, 3) */
for (nbits = 1; nbits <= regsze; nbits++) {
roundup = (1 << (regsze - nbits)) - 1;
bitfield = (low + roundup) >> (regsze - nbits);
value = bitfield << (regsze - nbits);

/* the value-interval (value + whatever bits)
*   [value,value+roundup)
* falls now always between the interval
*   [low,low+range)
* =>
*  low <= value <= value+roundup <= low+range
*/
if (((value + 0) >= (low + 0)) && ((value + roundup) <= (low + (range - 1))))
break;
}```
But that is a result of it's specific implementation and doesn't apply to CACM-style coders. For the moment I haven't come far, maybe you got an idea. Charles just throws out 0x80.

Ashford-coder:

2. While fiddeling around with arithmetic coders in general I also detected that it's possible to raise the precision of any coder-implementation you maybe got (even carry-less ones), when and if you have varying incoming "total". All coders, when the range-underflow occurs either maintain an underflow-queue or set the range to a safe value. All coders (I know) say range-underflow occurs when range goes below a static range-magnitude (0x4000 or 0x8000 or whatever, which is the maximum "total" you can feed to the coder). When you build up real statistics by counting "total" is not static and the underflow-case won't in real occur when it's assumed. So instead of:

1. range-contraction
2. flush known bits
3. raise range for underflow (against 0x4000)

You do the underflow-check before range-contraction and you check against the current total:

1. raise range for underflow (against total)
2. range-contraction
3. flush known bits

This has two effects; in the general case (with/without underflow-queue) you output less terminating bits (you allow range to go below safety after the last interval has been coded, which it completly fine; just assume the next interval you want to code is [0,1] e 1, range needs only to be above 1 to code that symbol); in the case of carry-less coders, besides having the underflow-case itself occur less frequently, you can set the forced range to a lower value. The famous Subbotin-coder becomes:

((ast.range = (-(ast.low)) % total), 1)

((ast.range = (-(ast.low)) & (minRange - 1)), 1)

The Subbotin-coder (as an exxample) has on an example source 850 bytes waste, applying this while running a normal order-0 rescaling (freq + 1 >>= 1) model lowers the waste to approximatly half 450 bytes. If you use Charles' range-determination, it goes down to 260, if you consider that you also have a shift after setting range (range can be >> 8 smaller), you pretty much end up with just 15 bytes waste (on a 10MB files vs. the DCC-coder). One of Eugene's carryless range coder takes the last "trick" already into account (setting 0xFF, then <<= .

I just want to note we'Re not talking about percents gain here, this is counting gain-bits with two hands.
But as I just try to find minimum waste-coders (reseting the range-coder every 256 intervals does lead to enormous waste) this is quite interesting being unmentioned in the entire literature. It's not even in the Carpinelli/Neal-coder (aka. DCC-coder) and that one tries not to waste even a single bit.

3. Originally Posted by Ethatron
The famous Subbotin-coder becomes:

((ast.range = (-(ast.low)) % total), 1)

((ast.range = (-(ast.low)) & (minRange - 1)), 1)
Ah, well that doesn't work out that easily (eg. "total - (low % total)"). I think there should be a valid solution adjusting low in addition (eg. "low += 0x10000 - total" works but is contra-productive). It's a bit complex.
Anyway here are measures of the verified combinations:

enwik7 with order-0/-1 fenwick-tree and the Subbotin-coder:

1412 bytes waste with regular coding "if (range < 0x10000) range = 0x10000 - (low & 0xFFFF)"
0570 bytes waste with Charles' coding "if (range < 0x10000) range = maximum(), low = low - (low & 0xFFFF) + (0x10000 - range)"
0340 bytes waste with delayed underflow coding "if (range < total) range = 0x10000 - (low & 0xFFFF)"
0315 bytes waste with delayed underflow and Charles' coding "if (range < total) range = maximum(), low = low - (low & 0xFFFF) + (0x10000 - range)"
~272 bytes waste with an expected working total-adjustment of range

4. 1. Please try making your posts more comprehensible, or are you really talking to yourself?
2. In theory full arithmetic coders (with variable "total" and divisions) are more interesting,
but many optimizations only apply to bitwise coders with 2^n fixed total, and that's what
most modern compressors use, eg. lzma, bsc.
3. My best coder optimized for short strings is in http://ctxmodel.net/files/guess/bit_guess_v1.rar
It implies that all the bytes after EOF are 0xFF, and thus able to cut off any trailing FFs from the code.
Imho its quite obvious how to implement it (although not simple at all) - the last "code" value in the decoder
has to be in last symbol's interval, so we have to put enough bits for that into the stream.
To that end, it may be important to ensure that most probable symbol takes the predefined outermost interval
([0..freq) or [1-freq;1)), so that it would be easier to assign shorter codes.
Although with a non-bitwise alphabet, I suspect, it might be the best to reorder the symbols
by huffman code.
But EOF, if its available, can be interpreted as any fixed bitstring and provide additional savings.

5. ## About arithmetic coders

Originally Posted by Ethatron
Ah, well that doesn't work out that easily (eg. "total - (low % total)"). I think there should be a valid solution adjusting low in addition (eg. "low += 0x10000 - total" works but is contra-productive). It's a bit complex.
Anyway here are measures of the verified combinations:

enwik7 with order-0/-1 fenwick-tree and the Subbotin-coder:
I am curious as far as general arithmetic file coders. It's really hard to beat
the many nonstationary coders that are commonly uses since they are
designed and optimized using what is hopefully typical data something
that I am not sure you seem concerned with. However if you want to check
out an optimal arithmetic coder I suggest you try arb255 it will compress
any permutation of any file to either some fixed number of bytes. Due to
the bijective nature of things it will either compress to a given n for all
permutations or to a n or n+1 bytes for other files when testing all
permutations. The arithmetic coders that are nonstationary will general
compress real useful targeted files smaller but permutations of the
file will not compress as small.

6. Originally Posted by Shelwien
1. Please try making your posts more comprehensible, or are you really talking to yourself?
You may read Bounds on Redundancy in Constrained Delay Arithmetic Coding, or The Delay-Redundancy Tradeoff in Lossless Source Coding. Otherwise there is no specific indepth material about how to write the least amount of disambiguating bits for generic arithmetic-encoder termination. It's apparently a not much attended topic BTW, or how do you explain your coders mostly terminate with five shiftCarry? There is not a single range-coder implementation out there with a decent termination (well I gave the algorithm in the first post, now one could terminate eg. Subbotin's coder decently).

Originally Posted by Shelwien
2. In theory full arithmetic coders (with variable "total" and divisions) are more interesting
Ah, vaya .... I'm actually interested in minimum arithmetic-coder termination, you can terminate shift/add-coders as well I think ...
It may be that nobody gives a damn about the Ashford-coder in particular, but I know Scott has some love for the termination-thing.

And here is a little Jewel for that you don't need pow2-total (carry-less BIAC cross-over), as you don't need pow2-total for Barney's Coder ... still they are pretty cracked down, don't think you need pow2-total.

7. Originally Posted by biject.bwts
I am curious as far as general arithmetic file coders. It's really hard to beat
the many nonstationary coders that are commonly uses since they are
designed and optimized using what is hopefully typical data something
that I am not sure you seem concerned with.
You're right, I'm not concerned with any model, file, distribution or anything; I'm just looking for some exactness, which can be applied to the arithmetic-coders; not this wish-wash "delete ari;" I'm done stuff. Not even the Guazzo/F0-coder has a correct termination. Of the non-stoneage coders apparently only some highly industrial coders (like CABAC, Q) even mind about going through the theory. There is no explanation why the CACM-coders can terminate with 2 disambiguing bits, at least not as an iterative process in terms of interval-fit, like what Radford Neal did with the DCC-coder.

Originally Posted by biject.bwts
However if you want to check
out an optimal arithmetic coder I suggest you try arb255 it will compress
any permutation of any file to either some fixed number of bytes. Due to
the bijective nature of things it will either compress to a given n for all
permutations or to a n or n+1 bytes for other files when testing all
permutations. The arithmetic coders that are nonstationary will general
compress real useful targeted files smaller but permutations of the
file will not compress as small.
I'm not sure I can find an explanation of how you terminate you're coder, but I can take a look.

8. Originally Posted by Ethatron
You're right, I'm not concerned with any model, file, distribution or anything; I'm just looking for some exactness, which can be applied to the arithmetic-coders; not this wish-wash "delete ari;" I'm done stuff. Not even the Guazzo/F0-coder has a correct termination. Of the non-stoneage coders apparently only some highly industrial coders (like CABAC, Q) even mind about going through the theory. There is no explanation why the CACM-coders can terminate with 2 disambiguing bits, at least not as an iterative process in terms of interval-fit, like what Radford Neal did with the DCC-coder.

I'm not sure I can find an explanation of how you terminate you're coder, but I can take a look.

I thought I wrote a lot about how I did it. However I am noted for a very poor
choice of words. There are several ways to do it. Actually the problem mentioned
in one of the papers you quoted for 3 symbol code is really no problem at all
when you use the carry methods. My coder arb255 is very close to Matt
Timmermans coder except it's slightly different. I actually started with a very
short high and low like 3 bits each and went through every termination
possibility. Look at

http://bijective.dogma.net/compres11.htm

Once I got that working I extended to longer high and low
at some point no more cases. I went to 64 bits however
if I do it again I will use 128 bit registers since Shelwien
showed me how ming 64 can have them. Of course it
will not compress smaller and will run slower but I don't
care.

I hope you can follow the code itself I tried to really not
follow on a course that most would do. Example every
thing is actually only binary coders. Sometimes the one
symbol occupies the top part of interval somethings not.
Sometimes the most probable occupies the top part of
interval sometimes not. I really don't like the way others
did it. I have over the years had several variations over
the years I code them play with them and then lose them
some are simple changes like KT update or Laplace or
some other weird way. Also made forms where each
symbol has same weight but so what. There is also
my bit IO stuff I have lost track of versions over the
years but it works. I worried about regular files
blowing up on uncompress so I have some form
of stability output transform to prevent this.

But I would be happy to have someone who can
actually write take a look. The code is a product of
my views. Shelwien called me a bijective maniac
at one time on the web maybe so. I think I am a
frustrated crypto person who realizes the future
of crypto is very much related to compression
especially bijective compression. And since it
appears Americans no longer have the freedom
to express there views on encryption in code that
compression is about as close as one can come.
I hope someday that America can be free again
so that it will once again shine. When that happens
I will write more cyrpto. But until that time I
think I will play with obscure bijective compression
methods.

9. Sitting over Andrew Polar's carry-less range-coder design, which is an extrapolation of the Guazzo/F0-coder, I became aware of a very good way to explain what a carry-less coder is and why the topic of minimum coder-termination is about the same thing.

Here is the basic coder-loop:

Code:
```    range = high - low;

/* We're doing limited precision arithmetic until ... */
high = (range * ivl->high) / ivl->total + 0;
low += (range * ivl->low ) / ivl->total + 1;

/* ... we don't have enough range to code the next symbol. */
if ((high - low) < minRange)
/* Therefor we make the coder reset by ... */
high = low;

/* ... executing the loop until low and high are different on enough digits. */
while ((low ^ high) < topRange) {
consumeHiSample(low);

/* expand high with trailing 1s */
high = (high << 8) | 0xFF;
low  = (low  << 8) | 0x00;
}```
What happens here is that in case we run out of precision, a straight reset of the coder is issued. Let's imagine our coder's initial state is low=0,high=FFFFFFFF,minRange=0000FFFF, and we encounter the precision-underflow on something like low=0000F000,high=00010000, not enough for the upcoming interval; the code set low and high equal which forces the loop to shift the entire "low"-register out, and FFs into the "high"-register; voila: low=0,high=FFFFFFFF.

In effect I could also write this:

Code:
```    range = high - low;

/* We're doing limited precision arithmetic until ... */
high = (range * ivl->high) / ivl->total + 0;
low += (range * ivl->low ) / ivl->total + 1;

/* ... we don't have enough range to code the next symbol. */
if ((high - low) < minRange) {
/* Therefor we make the coder reset by first fullstop() ... */
consumeAllHiSamples(low);    // 4 bytes

/* expand high with trailing 1s */
low  = (low  << 4 * 8) | 0x00000000;
high = (high << 4 * 8) | 0xFFFFFFFF;

/* ... then start() */
low  = 0x00000000;
high = 0xFFFFFFFF;
}
else {
while ((low ^ high) < topRange) {
consumeHiSample(low);

/* expand high with trailing 1s */
low  = (low  << 8) | 0x00;
high = (high << 8) | 0xFF;
}
}```
Now what becomes apparent, that if one could find a less waste-full way to do a stop(), one could make a more efficient carry-less coder. The task is to find a valid "low" on reset which still enables the decoder to uniquely decode the previously coded interval ("low" entirely is always the safe bet), but which is as short as possible (to have less waste):

Code:
```      /* Therefore we make the coder reset by first minimalstop() ... */
X = consumeAsMuchSamplesAsNecessary(low);    // 0-4 bytes

/* expand high with trailing 1s */
low  = (low  << X * 8) | (0x00 + 1 << X * 8 - 1);
high = (high << X * 8) | (0xFF + 1 << X * 8 - 1);```
We have there the possibiliy to search for a specific "low" in which the maximum amount of trailing digits is "irrelevant", it can be 0 or 1 or whatever it would still enable the decoder to decode the interval. But as we currently have the possibility to reset the coder to any (!) new state we want, we could also utilize a specific "low":

Code:
```    while ((high - low) < minRange) {
/* Therefor we make the coder reset by first astop() ... */
consumeHiSample(low);    // 1 byte

/* expand high with trailing 1s */
low  = (low  << 8) | 0x00;
high = (high << 8) | 0xFF;

/* ... then restart() */
low    = 0x00000000 + low;
range = 0xFFFFFFFF - low;
high   = low + range;
}```
What this code does is, recycling the previous "low" (which we know will always allow the decoder to uniquely decode the next interval) which of course requires us to readjust our high accordingly (high + low must be below the maximum value that the precision allows: 0xFFFFFFFF in the example).

And that is actually exactly what this line in the Shkarin/Subbotin-implemention means:

Code:
`      range = ((-low) & (RANGE_BOT - 1)) << 8;`
It means "we reset the range-coder with some small amount of digits, taking into acount that we can recycle the previous low, adjusting just high".

Now it must have become utterly clear, that once you find a sound and effective way to stop an arithmetic-coder design, you are also inherently able to make it an effective carry-less implementation; because carry-less/underflow-free is nothing else than resetting your coder once he can not function anymore.

Okay, I just wanted to draw out this relationship, maybe it's something new for you, maybe it's just a little nicer fairy-tale explanation of what's happening there; I also wanted to make clear I'm not nuts.

The mechanism above operates on multiple of 8 bits, and makes no attempt to figure out the exact position in the "low"-register where the rest of the digits is "irrelevant" (make no difference for the decoder). This means the Subbotin-Coder can be made more efficient by finding the cutoff-point in "low" and setting the irrelevant bits of "low" to zero, which in effect makes the renewed "range" bigger, which delays the next point-of-reset a little bit longer:

Code:
```    if ((high - low) < minRange) {
low = low & (0xFFFFFFFF << (32 - minimumHiDigits(low)));
high = 0xFFFFFFFF - low;
}

while ((high - low) < minRange)
...```
And that is then pretty much what Charles (not Ashford, Bloom it is ) is doing when he takes the choice between adjusting "low" (we can do that, we're currently resetting the coder!) or better "high", whichever of the two choices leaves us with the bigger range ("high - low") always considering that we have to make sure that our chosen "low" still allows the decoder to decode the previous interval.

10. > I also wanted to make clear I'm not nuts.

Frankly, you're still barely comprehensible.
Fortunately you don't seem to have problems with english at least.
Please understand the fact that you're not some VIP,
so people won't try to decrypt your messages
(which don't make any sense at all out of context), at least
unless there's something interesting in the obvious parts.
There's a good reason for all these "abstracts", "introductions"
and "conclusions" in academic papers.
Actually I only paid attention because there're related
(to "delays" and "termination") issues in my vectorized rc.
But I still don't understand why do you need a carryless rc,
or why somebody would be concerned about "delays" with
a single rc instance.

> How do you explain your coders mostly terminate with five shiftCarry?

Not all, for example the one which I linked in my previous post doesn't.

And there're reasons for simple flush and init, even if they produce
5 bytes of redundancy per file - its easy to prove that there're no bugs
in these parts, and its fast and easy to maintain.
("fast" mainly applies to extra zero byte at start of stream, "maintain"
implies reading/understanding, porting to various projects etc).

> very good way to explain what a carry-less coder is

Actually there's no need to explain what's a carryless rangecoder,
because its name tells enough already - its a rangecoder that
doesn't produce a carry.

> The task is to find a valid "low" on reset which still enables the
> decoder to uniquely decode the previously coded interval ("low"
> entirely is always the safe bet), but which is as short as possible
> (to have less waste):

Actually this is where range cutting to avoid carry and "coder termination"
are different - in range cutting we need to preserve maximum possible range
such that low's MSBs won't be affected by carry, while in coder flush we
need to find the subinterval [x..x+2^y) in the [low..low+range) interval,
such that y is maximal, and x is aligned. For example:

Code:
```for( y=0; y<32; y++ ) {
uint mask = (1<<y)-1;
uint l = (low+range-1 - mask) & (~mask);
uint h = l | mask;
if( (l<low) || (h>low+range) ) break;
}```
Note that I don't normally use "termination", because coder flush can
be used for multiple purposes and doesn't really mean the end of data.

Sure, carryless rangecoder can be implemented using a flush, but the
only benefit in that known to me is the possibility to somewhat simplify
the decoder, comparing to range-cutting carryless.

And a flush-based carryless with precise shortest-possible flush just
doesn't make sense because its (relatively) complex and decoder would
have to resync to input stream, which is plain awful (ie slow).

> line in the Shkarin/Subbotin-implemention

Its purely Subbotin's:
This is the original post (google didn't save it for some reason):
http://compression.ru/fido/ru.compress.0027.htm#26
And this is updated version:

> Now it must have become utterly clear, that once you find a sound
> and effective way to stop an arithmetic-coder design, you are also
> inherently able to make it an effective carry-less implementation

flush-based carryless is more redundant than range-cutting carryless
by definition - because range-cutting tries to maximize the range
and flush doesn't.

> And that is then pretty much what Charles is doing when he takes the
> choice between adjusting "low" or better "high"

1. Actually Bloom's version is incomplete. It doesn't handle the
low+range overflow anywhere, which means that another branch is
necessary. And actually even Subbotin's version, which is much
simpler than Bloom's, still has slower decoding than full rangecoder.

Using stats from http://compression.ru/sh/coders6a.rar (2001):
Code:
```               C.Size   C-Time   D-Time
Subbotin   10,682,917    5.55     6.92  // Subbotin's rangecoder
Subb-LB    10,682,917    5.55     6.97  // L.Broukhis mod which adds a cache byte
CL-R       10,680,668    5.55     7.08  // my mod with 64-bit low
CL-Rf      10,680,351    5.44     6.70  // my flush-based carryless
Shcoder    10,680,642    6.97     8.34  // my mod of Schindler's rangecoder
Shindlet   10,680,348    5.77     6.48  // -//- with 32-bit multiplication and 64-bit low```
So 6% faster encoding with 3.5% slower decoding is the best result for carryless,
but we don't need faster encoding with slower decoding.

2. As you can see in CLR vs Subbotin's, its really easy to improve its precision
by extending low to 64 bits, so complicated handling with multiple branches is unnecessary.
And btw, extra operations in rc are really bad even if they're not really executed at all -
lzma has 15 rc calls per loop iteration, now imagine what happens if all of these are inlined.

3. low in CLRF is 64-bit too, but it only does flushes if low>>32==0xFFFFFFFF, which
is normally very rare. Also it appeared possible to implement decoder with 32-bit
"code" and w/o low (almost like rc with carries), but it still has to track FFs,
which is inevitably slower.

11. Originally Posted by Shelwien
> I also wanted to make clear I'm not nuts.

Frankly, you're still barely comprehensible.
Fortunately you don't seem to have problems with english at least.
Please understand the fact that you're not some VIP,
so people won't try to decrypt your messages
(which don't make any sense at all out of context) ...
What's the point of saying this?

12. Subbotin's coder is instructive. You can see two places where you lose a little coding space. One is due to division roundoff discarding a little of the top of the range in:

low+= cum_freq*(range/=tot_freq);

To minimize this, you want tot_freq to be small, but this reduces the possible precision of representing probabilities. range is kept in the range 2^24 to 2^32, so tot_freq should probably not be bigger than about 2^16, which means 8 bits for symbol frequencies for low redundancy data. Probability errors of e add redundancy of e^2 on average, so this is only a problem when coding highly redundant data with probabilities near 1 and the relative error is large due to quantization. If tot_freq is 64K then the highest probability you could have is 255/256 leaving space for the rest of the alphabet. This wastes 1 bit every 256 bytes.

The other inefficiency is in normalization. If the range falls below 16 bits then the range is split on a 64K boundary and the top half is thrown away. This costs 1 bit on average. The range will be 24 to 32 bit, so 24 bits about 1/8 of the time. If the input has low redundancy then each byte will drop 8 bits from the range, so this range splitting can be fairly common.

You can avoid these problems with 64 bit arithmetic or using bit model like PAQ instead of a byte model like PPM. In a bit model the minimum range is 1 and no range has to ever be discarded. Such a small range is rare (probability 2^-32) and only results in the bit probability rounded to 1/2 so costs at most 1 bit.

The disadvantage of a bit model is you need 8 coding operations per byte. But byte level modeling has the same complexity when you consider calculating the cumulative and total frequencies using trees. (Using the obvious linear operations would be slower).

13. > What's the point of saying this?

That's how your attitude looks to me, sorry if you didn't want to know.

Anyway, if you want somebody to really read your posts, you should consider making them readable,
by adding enough information so that people won't have to look up stuff in other places to understand what you're trying to say.

As a good example, now we have Matt's post here - imho its pretty clear that Matt didn't get what you're talking about,
except for a few keywords.
And actually I doubt that I got 100% too, because you never explained your context/environment details, like what you're
trying to do and why known solutions don't work.

14. Originally Posted by Matt Mahoney
Subbotin's coder is instructive ...
Thanks, Matt. I know all that. (no sarcasm)

Originally Posted by Matt Mahoney
If the range falls below 16 bits then the range is split on a 64K boundary and the top half is thrown away.
Which means you agree on the terms "coder termination" and "range/coder reset" for that operation? What you describe is resetting the coder/range. One could also initialize the Subbotin-coder at the very beginning with [0,0x0000FFFF] or [0,0xFFFF0000] if one prefers symmetry.

----------------------

But back to the Ashford coder. I have a guess why the regular CACM procedure doesn't work. Charles does a smart thing and added the low-remainder to to range:

Code:
```    t = ivl->low * (range + 1);
x = (t / total);
y = (t % total);
low += x;

t = (high - low) * (range + 1);
t += y;
range = (t / ivl->total) - 1;```
I believe he can not terminate with a code between [low,low+range]. On the decoder-side:

Code:
`    target = (((((code - low) + 1) * total) - 1) / (range + 1));`
range will be slightly "too big", and one should choose some smaller code between [low-unkown,low-unkown+range]. Empirically all failing decodings are one too small as expected. But this is really complicated, having no real access to the previous-last range I don't think it can be calculated. What do you think?

15. Originally Posted by Shelwien
> very good way to explain what a carry-less coder is

Actually there's no need to explain what's a carryless rangecoder,
because its name tells enough already - its a rangecoder that
doesn't produce a carry.
Well, it's not as simplistic. Arithmetic-coders have precision-underflow problems because the calculation is finite. There is no single universal way to solve that; if you call it carry-over, underflow-queue, virtual-queueing or bit-stuffing means just that you're talking about a specific school/implementation of handling the problem (and Howard correctly says in his paper they all sides of the same). The carry-idea comes specifically from Rubin (if I'm not mistaken) and it's renormalization approach, and you could not make just any implementation carry-free, because in some algorithms occurs no carry. That's why it's sometimes ugly to name all of them like "carry-free/underflow-free/queueless", or sometimes one just says "carry-free" but means all of them.
They are all sides of the same, they try to solve the precision-underflow problem, you can solve that by the specific ideas (underflow-queue, bit-stuffing, virtual queue, there is even the possibility to not use low==0) or you just reset the coder (see below what specifically I mean by termination). I didn't say much more than that.

Originally Posted by Shelwien
> The task is to find a valid "low" on reset which still enables the
> decoder to uniquely decode the previously coded interval ("low"
> entirely is always the safe bet), but which is as short as possible
> (to have less waste):

Actually this is where range cutting to avoid carry and "coder termination"
are different - in range cutting we need to preserve maximum possible range ...
I know what you mean, but not very well expressed! (Don't want to mock you, just hint that trying to understand and difficulty of expression is a mutual task/failure, right?.)

Originally Posted by Shelwien
... such that low's MSBs won't be affected by carry, while in coder flush we
need to find the subinterval [x..x+2^y) in the [low..low+range) interval,
such that y is maximal, and x is aligned. For example:

Code:
```for( y=0; y<32; y++ ) {
uint mask = (1<<y)-1;
uint l = (low+range-1 - mask) & (~mask);
uint h = l | mask;
if( (l<low) || (h>low+range) ) break;
}```
Look into the first post ... you just repeated what I pasted there (I added the code-tags much better I hope, sorry for that.) You only need to be aware that that is a function specific to an imlementation, it won't work with the Ashford-coder for example. It also does not "explain" how it interacts with pending underflow-bits. (the paper is "Arithmetic Coding Revisited")

Originally Posted by Shelwien
Note that I don't normally use "termination", because coder flush can
be used for multiple purposes and doesn't really mean the end of data.
Well, I use the word "termination" for saying that I "terminate" the current consecusive dependent interval-state the coder is actually in. After "termination" the next succession of interval-states the coder is in, have no relationship with the intervals from before the "termination" point. There is no norm in coder-initialization, I can define any initialization-interval which is just a valid big enough interval, I can initialize it with random (deterministic) big enough intervals if I want to. The sole action of setting the interval to some uncorrelated value is re-initialization.

Originally Posted by Shelwien
Sure, carryless rangecoder can be implemented using a flush, but the
only benefit in that known to me is the possibility to somewhat simplify
the decoder, comparing to range-cutting carryless.

And a flush-based carryless with precise shortest-possible flush just
doesn't make sense because its (relatively) complex and decoder would
have to resync to input stream, which is plain awful (ie slow).
Think outside your box ... I got none (!) of the disadvantages you quote, it's not less efficient, and it's not slower, and it's not really complex either. But that's not the point; you have your constraints, and I have mine, and others theirs; you maybe want 50-lines code, I don't mind 40k-lines code. There is no universal truth about "best" or even "simple" here (eg. for not making it nicely in the coder I may/could complicate the system occupying the coder beyond sanity).

Originally Posted by Shelwien
flush-based carryless is more redundant than range-cutting carryless
by definition - because range-cutting tries to maximize the range
and flush doesn't.
It's the same man. Why don't we agree on that we disagree? I can adjust to your terminology and stop talking about "termination", it's just the sentences are going to be longer (case this, case that, including those and there) ...

Originally Posted by Shelwien
> And that is then pretty much what Charles is doing when he takes the
> choice between adjusting "low" or better "high"

1. Actually Bloom's version is incomplete. It doesn't handle the
low+range overflow anywhere, which means that another branch is
necessary.
No, because it's implicit. Think about it.

16. > No, because it's implicit. Think about it.

Ok, what am I doing wrong?

Code:
```low = 0xFFFFFFFF;
range = 0x0000FFFF

while( range<(1<<16) ) { // true

int above = ( low + range) & 0xFFFF; // (0xFFFFFFFF+0x0000FFFF)&0xFFFF = 0xFFFE
int below = (~low) & 0xFFFF; // (~0xFFFFFFFF)&0xFFFF = 0x0000

if( above > below ) { // true
low += range - above; // 0xFFFFFFFF + 0xFFFF-0xFFFE = 0x00000000 !!! OVERFLOW !!!
range = above; // 0xFFFE
} else {
range = below;
}

*ptr++ = low>>(RANGE_SIZE-8);
range <<= 8;
low <<= 8;
}```

17. The sum of low and range must be below-equal 0xFFFFFFFF. The interval is [0,1) or [low,low+range) or [0xFFFF0000,0xFFFFFFFF]. If you violate that it won't work. Do you have 0x200000000 as interval-boundary?

18. Subbotin's coder: http://codepad.org/MEL3YReu

Well, ok, I guess its a misunderstanding again.
I implied that Bloom's trick would be used kinda like that - http://codepad.org/TMZrdI8C
But actually that just doesn't work - without ((low^low+range)>=TOP) that "above" can
be larger than range etc, so its plain incorrect.

So I guess its supposed to look like http://codepad.org/Uspvykm9
In which case low+range<2^32 is maintained allright, but it doesn't look like an improvement at all.

19. You have to put it into the decoder-renorm too ...
Take this version, it's nicer than Subbotin's own version (out of MRP), you'll also immediatly see where Charles code fits :

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•