Hi :
Can ANS be made 'optimal' ( ie equates Shannon's exact number of bits )? OR is it already but for end byte padding, multiplicities frequencies recording within Headers etc ?
Hi :
Can ANS be made 'optimal' ( ie equates Shannon's exact number of bits )? OR is it already but for end byte padding, multiplicities frequencies recording within Headers etc ?
Probably every codec that works on fixed size numbers and does periodical renormalization (i.e. virtually every codec) is not perfectly optimal. You probably would need to operate on big (big precision wise) floating point numbers with size comparable to compressed file size to make the process as close to theory as possible.
LawCounsels (12th March 2020)
ANS and arithmetic coding work on some approximated probability distributions, but can get as close to a chosen distribution as we want by increasing the state, which acts as a buffer for fractional bits.
In ANS we get ~4 times smaller redundancy (used additional bits/symbol) for each doubling of the number of states. In practice, using ~8x more states than alphabet size we are operating on level of inaccuracy of estimation.
ANS additionally needs to store the final encoder state, however, we can compensate it by storing some information in the initial encoder state.
Can ANS be made exact same bits as Shannon's limit , using Arbitrary Precision Integers etc etc ? ( ignore any end byte padding, recording multiplicities frequencies in Header Headers etc etc ) ?
Huffman is exact - for sources with power of 2 probabilities.
In rANS we approximate probabilities with 1/2^m denominator, and then there is still some approximation - it rather never is perfect, but you can easily get negligible e.g. 10^-10 bits/symbol redundancy.
With 10^-10 , presumably many many ( vast majority ) input files give exact same number of bits as Shannon's ?
Using 8x more states than alphabet size you get ~10^-3, 10^-4 bits/symbol more ... and it reduces ~4x every doubling of number of states.
In rANS using 32bit state for 8bit alphabet, it gives you ~10^-20 bits/symbol accuracy ... you can use 64bit if it is not enough.
Anyway, inaccuracy of AC/ANS is rather negligible in practice - what counts are better predictions, transformations, statistical models ...
Inefficiencies are usually very small. Look at old aridemo from Eugene: http://compression.ru/sh/downl_e.htm http://compression.ru/sh/coders6a.rar
Results:
Same probabilities, different arithmetic / range coders that work on fixed size number formats.Code:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ������� ���� 17,390,634 AriDemo Model: o0c_v2a.inc ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ CodeSize C-Time D-Time CL-D 10,680,328 9.66 11.47 CL-R 10,680,668 5.55 7.08 CL-Rf 10,680,351 5.44 6.70 CL-Rfm 10,680,327 6.86 8.62 Subbotin 10,682,917 5.55 6.92 Subb-LB 10,682,917 5.55 6.97 Shindlet 10,680,348 5.77 6.48 Shcoder 10,680,642 6.97 8.34 Ari 10,680,318 12.47 14.06 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~