Ways to ensure that decoder ate all the bytes that encoder produced

If the title is not meaningful here is what I mean:
- arithmetic encoding & decoding,
- I want the arithmetic encoder to produce the same number of bytes that the arithmetic decoder will consume,
- I want to avoid storing explicit markers or stream lengths,

Currently my method is to:
- do renormalization before encoding/ decoding of each symbol instead of after,
- in encoder, after encoding all the symbols, I flush the current buffer as 4 bytes to stream,

After a few quick tests, it seems that my method works.

If you encode using (low, high) or (low, range) and flush 4 bytes of low, then isn't it possible to encode more symbols at the low end of the range without causing a normalization or changing the value of high/range?

It almost sounds like your trying to do a bijective arithmetic compressor except sort of use it for streams where there can be several separated files all one after the other. One easy way to do that is to have the first field contain the length of the compressed date in the stream. You do a normal bijective arithmetic compression you get an output data set that contains exactly what the decompress needs. When you decompress you use the first field the length of the compressed stream as where an EOF would occur if doing bijective file decompression then do the standard bijective decompression. No wasted bits.

You are generally right, except that I have explicitly stated that I do not want to store stream lengths. I want to support piping, ie reading streams like stdin and writing to streams like stdout and we cannot compute lengths of such stream in general.

The ZPAQ encoder also does not store the uncompressed size. It codes an EOF flag after each byte with an effective probability of 1 - 1/range of false and 2^-32 of true. This requires 9 coding operations per byte, but only 8 of them have any significant modeling so the overhead is not much. The exact algorithm is described in section 4 of http://mattmahoney.net/dc/zpaq200.pdf with PR = 0, where 1 means EOF.

Also, the encoder throws away a tiny bit of coding space to avoid writing 4 consecutive 0 bytes. It uses this instead to mark the end of the stream so it can be found quickly without decoding.

/* the value-interval (value + whatever bits)
* [value,value+roundup)
* falls now always between the interval
* [low,low+range)
* ->
* low <= value <= value+roundup <= low+range
*/
if (((value + 0) >= (ast.low + 0)) && ((value + roundup) <= (ast.low + (ast.range - 1))))
break;
}
#endif

assert(nbits <= 18);
return nbits;
}

void injectStop() {
#define CODING_DISAMBYTS_AR_RANGELOW 3
/* the formula for calculating the upper bound on the minimum number of
* bits required is actually something like clz(maxLow - minRange) + 2
* mostly it means the bytes below botRange can simply be stripped if
* we use a carry-less coder
*
* in the case of delayed underflow-detection this number can be as large
* as the entire register (because range can approach 1); there is no extra
* production of of bits though, as normally underflow-bits would build up
* instead of additional flush-bits
*
* in effect it's very important to have a calculatable variable number of
* termination-bits for the delayed underflow-detection, because otherwise
* a static known number of bits would require us to always flush all bits
* in expectance that range could be 1.
*/

regval bitfield; int nbits = finishQueue(bitfield);
while (nbits % BITS_OF_UCHAR) bitfield <<= 1, nbits++;
for (int i = BITS_OF_UCHAR; i <= nbits; i += BITS_OF_UCHAR)
io->putSample((unsigned char)(bitfield >> (nbits - i)));

/* coders which work with code/value instead of low, manipulate the
* register while decoding (eg. data-low), this of course destroys
* trailing data from subsequent decoding; the only way to safeguard
* the trailing data is to put the entire register out on Stop().!
*
* the other possibility is to maintain two registers code and low
* then the original value can be reconstructed through code+low
*/

regval bitfield; int nbits = finishQueue(bitfield);
for (int i = 1; i <= nbits; i++)
clearUnderflow((bitfield >> (nbits - i)) & 1);

/* fixed number of bits (with trailing zeros) */
if (bpolicy && (nbits < CODING_DISAMBITS_AR_DCC96LOW)) {
bio->BitIOLeft::clearSamples(CODING_DISAMBITS_AR_DCC96LOW - nbits);
nbits = CODING_DISAMBITS_AR_DCC96LOW;
}

void injectStop() {
#define CODING_DISAMBITS_AR_CACMLOW 2
/* Output two bits that select the quarter that
* the current code range contains.
*/
ast.underflow += 1;
clearUnderflow(!(ast.low < firstRange));

Despite not being complete, I hope the functionnames and all are selfexplaining.

Now the "dejectCarryOver"/"dejectCarryOut"/"injectFlush" are simple function providing the carry abilitity and adding X guard bits etc. pp. Not that important for you I think. If you want to dig into it here's the code.

You are generally right, except that I have explicitly stated that I do not want to store stream lengths. I want to support piping, ie reading streams like stdin and writing to streams like stdout and we cannot compute lengths of such stream in general.

I was actually thinking of streams. The beginning word of the data block is just a length of that block which just happens to be the exact length of the bijectively compressed file. I am not saying to use it as an End Of Stream the stream continues on.
But yes you have to compute the value after you have compressed the file. Otherwise you have to encode an EOF so you know where in the stream your compressed data ended. If this is not what you want then it seems to be you want something that is really nothing more that the standard ending to compressing to a stream so not exactly a new problem. If you are worried about squeezing a few extra bits on the average. You can encode the number of bits needed as a universal number in front of the bijective compression which has the distance to the last ONE used in the compressed string you don't even need to write this last ONE bit thus saving an extra bit. In which case the stream can be thought of as a bit stream instead of byte stream.