Even skal's name: FSC clearly points the reference ...
Would someone update the Facebook of Compression with the true history, please!
That's what you'd have to do to solve the full problem, making the fewest assumptions. I'm looking for other things, too, that might be more straightforward to measure or give you different information.
Huffman is a very trivial case, because the size does not depend on order, every symbol maps to a single code, and you're only dealing with integers.
ANS is a much different beast.
Last edited by nburns; 18th March 2014 at 10:46.
contact them first. they may just don't know that there is a convention to give credits to authors of original work
looking at the FSC initial commit i see f.e. the PutBlock finction allowing to encode data in small reversed blocks. i don't keep eye on FSE development, is it also from your code? may be entire FSC is just reimplementation of Jarek's idea from ground up and it is the same as yours only in the things that can't be done other way? or, more probably, he was looked into your code and then reimplemented it from scratch?
i'm sorry that i don't yet looked into Jarek paper and don't understand what is your ideas and what is from the paper
Yes that's the most likely scenario.or, more probably, he was looked into your code and then reimplemented it from scratch?
There are a lot of minor implementation choices which can be done in many different ways, they happen to be identical in the 2 versions.
I want to make it clear that there is no legal problem in doing so.
The code is published in BSD license for this reason, so it's meant to be re-used, and adapted, and I'm glad people do it.
Building on top of someone else's code is fine.
But the normal and ethical way is to simply name its source, and if possible explain the differences.
Last edited by Cyan; 5th May 2014 at 13:09.
I'm not sure if you've grokked the "big picture" yet, but let me take a shot at explaining it, for the benefit of any newcomers, at least:
ANS is conceptually based on two functions, rank() and reduce_precision(), with inputs and outputs like so:
S: string of weighted symbols
r: integer
rank: S -> r
i,j: integers
T,U: stacks
reduce_precision: i * T -> j * U
rank() computes a rank for a string of symbols from a weighted alphabet. It can be written as a recurrence:
reduce_precision() removes the least-significant bit of an integer and saves it onto a stack. The result is a shorter integer and a modified stack.Code:q,r: integer ranks s: symbol f: q * s -> r // f: compute a rank from a rank and an additional symbol int f(int r, symbol s) { //... } // rank: compute the rank of a string int rank(string S) { length <- |S| if(length == 0) { return FIRST_RANK } else { S' <- take_prefix(S, length-1) // make a new string from first length-1 symbols of S s <- last_symbol(S) return f(rank(S'), s) } }
If there is no restriction on the size of an integer, the encoding is simply the rank:Code:int reduce_precision(int i, stack T) { push(low_bit(i), T) j = shift_right(i, 1) return j }
But as a rule, any interesting string will exceed the size of a machine word. Function f always increases the rank, so we need a way to detect when the rank is about to overflow:Code:// Encoder on an machine with infinite precision: print rank(S)
When a potential overflow is detected, we apply reduce_precision as many times as necessary, then continue with the usual rank calculation (rewritten as a loop):Code:q: integer rank s: symbol c: boolean would_overflow: q * s -> c would_overflow = true if f(q,s) exceeds a predefined limit, false otherwise
This is an implementation of ANS at a high level.Code:r: integer rank T: stack r <- FIRST_RANK T <- EMPTY_STACK while((s = read()) != EOF) { while(would_overflow(r, s)) { r = reduce_precision(r, T) } r = f(r, s) } // the output is the final rank and the contents of the stack: print r, T
Since reduce_precision() saves the truncated bits to a stack, and the stack is part of the output, it can be inverted. The f function is also carefully chosen to have an inverse, so the result of the algorithm is decodeable. The size of the output in the overflow-conscious algorithm (ANS) is close to the size of the exact rank(S). rank(x) is chosen to return an integer approximation of the entropy of x, for all x; therefore, ANS achieves compression close to the theoretical bound.
Since the intermediate rank is the only state maintained by the algorithm, it can be thought of as a finite state machine with a state for every possible rank.
Last edited by nburns; 14th November 2014 at 20:54. Reason: Clarified
Bulat Ziganshin (9th July 2014)
I based the final encoder pseudocode on what I've seen done, e.g. in fpaqc, for the ANS primer I posted above. After writing it, I gave it some thought, and I realized that you can construct situations that cause that algorithm to fail: you can cause it to wind up in a state that's invalid due to being outside of the range.
I went back and looked at Jarek's original paper, where he proves the validity of the algorithms, and I found that his proof uses reduce() in a different way than the one I gave here (to use it the proved way, you'd have to do something like jump to a too-large state unconditionally and then reduce back into range.) Has anyone else dealt with this?
Edit:
I changed the ANS primer to use the known-invertible algorithm.
Edit:
Changed back, reflecting the fact that fpaqc tested OK.
Last edited by nburns; 10th July 2014 at 03:54.
Well, I finally got around to trying to demonstrate the problem I described in my last post. Using fpaqc as subject, I exhaustively tested all possible combinations of probability and state. I found no errors in fpaqc.
I also played around with the predictor, and I seem to have made it faster. Here's the patch:
--- fpaqc.cpp 2007-12-25 17:45:34.000000000 -0800
+++ fpaqc/fpaqc.cpp 2014-07-03 22:01:34.677510612 -0700
@@ -67,8 +67,12 @@
}
void update(int y) {
+#ifndef FASTPRED
if (y) t[cxt]+=65536-t[cxt]>>5;
else t[cxt]-=t[cxt]>>5;
+#else
+ t[cxt] = t[cxt] - (t[cxt] - y >> 5) + (-y & 0x7ff);
+#endif
if ((cxt+=cxt+y)>=512) cxt=1;
}
};
Last edited by nburns; 9th July 2014 at 03:19.
Bulat Ziganshin (15th May 2016),Matt Mahoney (9th July 2014)
Pascal has added 4x interleaving and alias rANS to his implementation:
https://github.com/skal65535/fsc
Cyan (4th November 2014)
I'm having a hard time figuring out how to use it. I'm not sure what he means by words. Are those words in the linguistic sense, or words as in fixed-size chunks of data larger than bytes?
With no command-line options, it works on enwik7.bwts.mtf, but on enwik8.bwts.mtf and enwik9.bwts.mtf it fails in the table-building routine with a generic error message. Am I misusing it somehow or is that a bug?
"words as in fixed-size chunks of data larger than bytes". From the readme file:
The word-based coding methods (CODING_METHOD_16B and up) will use b=2^16 and write 16b-words at a time. They also have variants with interleaving and alias method.
I think I saw that. So can we conclude that by words he means multi-byte chunks? Part of my point was my feeling that the documentation is unclear, and I still feel that way. More pointedly, I tried throwing some basic test data at it and it failed for reasons that are still obscure. What do I need to do to test with enwik9.bwts?*
*Note that enwik9.bwts ran fine, but the compression was no different than plain enwik9 and looked to match approximately the 0-order entropy of the whole file (roughly what you'd expect from, e.g., Huffman, using whole-file symbol frequencies). To appropriately compress the BWT(S) it either needs to implement some kind of local adaptation internally or accept something like MTF-transformed data (which failed for enwik8+).
Last edited by nburns; 6th November 2014 at 09:13.
Hi everybody,
oh, wow, i didn't know 'fsc' was known outside of my conversation with Jarek about ANS! Thanks for your interest.
To be clear, it's just a brute-force non-adaptive entropy coder experiment, and will probably stay so.
It was mostly written to quickly experiment with alternate spread-functions ('modulo', 'pack', 'reverse', but
also Jarek's bucket-sorting one, e.g.), as well as trying to understand ANS underlying principles.
My apologies to Yann, i didn't realize FSE was his, i initially thought "Cyan" was Jarek's nickname (and FSE
his code). My bad! Updated the README.md with full credits, as due!
I think i now have clearer ideas about ANS. I really perceive it as an elegant dynamic system, more than an arithmetic
coder, that relies on few general principles (like: "state 'x' can encode ln(x) bits of information"). Quite novel and intriguing.
Some precisions/remarks:
* so, yes, 'fsc' is a simple tool to experiment. Usage is simple:
./fsc < input_file > compressed_file
./fsc -d < compressed_file > reconstructed_file
diff reconstructed_file input_file || die
* With time, i ended up implementing some of Ryg's and Charles' ideas too:
- interleaving (2x, 4x)
- word-based output (and not per-bit): i sticked to 16b because i'm mostly interested in ARM and mobile, where 64b calc is not a good idea...
- alias method
- specialized binary version (bANS?) using 32b-based output and 32b precision for probas
- and a 'pure' arithmetic coding version too, for comparison with the bANS (here too 32b precision, 32b output). Just run './bit_cmp'.
-...
* So far i can tell:
- bANS is not necessarily faster than arith coding
- arith coding is usually more precise
- it's hard to improve precision (deltaH) in ANS. One need to fiddle with the spread functions quite some, in sometimes surprising ways...
- multi-symbol is very easy (and fast!) with ANS, much easier than arith coding. That's a great selling point.
- interleaving speed-up is yummy too!
- reverse-direction coding in ANS is a pain. Using block-based coding is ok, but you lose a little bytes at the end of each block to store fine state. And it makes adapting probabilities (and context modeling) harder, resorting to multi-pass block-coding (-> speed penalty).
- all the tricks (alias, interleaving, 16b-output, spread-tables) are 'baked in' in the final bitstream. Means for instance you can use a coder with 16b-based output on a mobile phone and decode the bitstream with a 64b-based reader. Or use the alias method on decoder but not on the encoder... Some platform tradeoffs are needed i guess.
- i'm unsure how hardware-friendly ANS is overall. Probably better than Huffman, agreed.
That's it. i overall quite like ANS and will still be fiddling around as time permits...
thx!
/skal
Bulat Ziganshin (14th November 2014),Jarek (13th November 2014)
Hi skal,
1) regarding storing the final state, you can compensate it by storing some information in the initial state.
For example use initial state L+s_0 where s_0 is the first symbol - it can be decoded when the bitstream has dried out.
More sophisticated (useful for rANS) is starting from state x=1 and using a few times rANS (or uABS) formulas - until you get to I={L,...,bL-1}.
Alternatively, we can use this state as a checksum. Fix initial state while encoding, and if there was an error while decoding, we will start wandering randomly - most probably getting to a different state, indicating an error.
2) Regarding approximated binary case (I usually referred as ABS) - let's look at CABAC M-coder from h.264 ( http://iphome.hhi.de/marpe/download/...arpe_et_al.pdf ):
R_LPS = RTAB[m][(R > > 6) & 3];
R_MPS = R - R_LPS;
if (V < (R_MPS < < BitsLeft))
{R=R_MPS; value=valMPS;
Rnorm = (R_MPS > > 8 ) XOR 1;}
else
{V = V – (R_MPS < < BitsLeft)
R = R_LPS; value = !valMPS
Rnorm = RnormTAB[R_LPS > > 3];}
R = R << Rnorm;
BitsLeft = BitsLeft - Rnorm;
if (BitsLeft < = 0)
{V = (V < < M_BITS) | read_bits(M_BITS);
BitsLeft = BitsLeft + M_BITS;}
in comparison, decoding step for tABS is just (m is quantized probability):
t = DecodingTable[m][x];
value = valMPS XOR t.symbol;
x = t.NewX + read_bits(t.nbBits);
No branches, no bit-by-bit renormalization, no multiplication, single number state e.g. for SIMD ... direct handling of large alphabet - no binarization.
However, for extremely accurate applications like ultracompressors, indeed ABS probably does not bring any advantages.
3) Regarding accuracy for large alphabet.
For rANS you can get as good as you want using 64 bit.
For tANS it is mostly a matter of large enough L. Especially such that all symbols have probability above 1/(L sqrt(2)).
Regarding symbol spread, I have managed to get the tuning: spread using not only quantized probabilities (q_s ~ L * p_s), but also the actual ones (p_s) - see: https://github.com/JarekDuda/Asymmet...SystemsToolkit
4) Regarding inconvenience of backward encoding for e.g. adaptive situations, it is just a matter of using a buffer: storing a sequence of pairs: (s_i, D_i) where s_i is the i-th symbol, which is from distribution number D_i.
Then encode this buffer in backward direction (table[x] -> table[D_i][x]), now decoding is straightforward.
Here is JamesB implementation of order1 rANS: http://encode.su/threads/2017-LzTurb...ll=1#post39920
5. Regarding comparison with Huffman, tANS requires a bit more memory per L. However, smaller L can bring smaller deltaH due to handling fractional bits - see slide 5 of https://dl.dropboxusercontent.com/u/12405967/ANSsem.pdf
Big hardware advantage is no need for costly tree building - initialization is much simpler and cheaper.
Best,
Jarek
Last edited by Jarek; 14th November 2014 at 00:17.
Jarek,
thanks for the very subtle hint #1 !!
I implemented it, and it works well at recovering some of the left-over bytes for each block.
It's a little bit tricky with interleaving, though. I'll update the fsc code when i'm finished dealing
with corner cases...
/skal
In the course of trying to better understand ANS (and LaTeX), I used LaTeX to typeset some of the ANS formulas in concise form. Representing them in plain ASCII is terrible, and they can be hard to find in their respective papers. Seeing the formulas by themselves, properly typeset, is somewhat easier on the eyes and is good for reference (and interior decorating). I also played with various different notation and variable naming; you may like my choices, or you may not. I've attached uABS and rANS along with their respective LaTeX sources.
Please let me know promptly if you find any errors.
Edit:
Both are corrected.
Last edited by nburns; 15th November 2014 at 21:17.
Bulat Ziganshin (15th May 2016),Paul W. (14th November 2014)
I was wondering if it is possible to apply also rANS algorithms to guess the bit compression (like fpaq and not fpaqA) and whether it is necessary to encode at the back even the bits!
I am not certain I understand the question.
If you are asking if rANS can be applied to a binary alphabet, sure but there are probably no advantages (interleaving?) - here is Charles' comparison: http://pastebin.com/qqyLNXFA
Regarding getting rid of backward encoding, it is a very good question.
I was not able to do it for ANS, but maybe it can be done for a different approach ... ?
And yes - entropy coding automaton can be also made by approximating arithmetic coding (having forward encoding) - getting e.g. so called quasi arithmetic coding: http://www.ittc.ku.edu/~jsv/Papers/HoV93.qtfull.pdf
However, as range coding requires two numbers to represent the current state, the number of states of such automata would grow much faster than for ANS (~ L^2) - if we would like a better precision or larger alphabet.
And so e.g. h.264 uses a different approach: "M-coder" (see listing above).
If you are asking about wasting bits for storing the final state, we can compensate it by storing some information in the initial state - see my #1 answer 2 posts above.
Last edited by Jarek; 15th November 2014 at 18:54.
I meant that for absurd there would be need a bit version [0 and 1 symbols] (comparing the rANS for multiple symbols) to have the range but a code x (rans) comparable. The code might be this:
if ( x < (1<<32) )If you are asking if rANS can be applied to a binary alphabet, sure but there are probably no advantages (interleaving?) - here is Charles' comparison: http://pastebin.com/qqyLNXFA
{
x <<= 32;
x |= *dword_in_ptr++;
}
uint64 xm = x & 4095;
if ( xm < p0 )
{
x = p0 * (x >> 12) + xm;
p0 += (4096 - p0) >> 5;
return 0;
}
else
{
x = (4096 - p0) * (x>>12) + xm - p0;
p0 -= p0 >> 5;
return 1;
}
but to compress change anything with this version, should I compress in reverse?
Last edited by Nania Francesco; 15th November 2014 at 19:28.
I have just linked you something like that (rABS): http://pastebin.com/qqyLNXFA
And written that (alone) it probably does not bring any advantages ... (interleaving?)
However, it indeed can be useful for adding binary symbols to rANS stream ... you can use also the uABS formulas instead (fpaqa/b/c).
Thanks, I did not understand only if the encoding is asymmetric as in rANS?
The "asymmetric" word of ANS refers to asymmetrization of standard numeral systems: which are optimal for uniform probability distribution of digits.
If you refer to reverse encoding, the only forward encoded ANS family I know are prefix codes (but I haven't spent much time on looking), which decoder can be seen as a special case of tANS decoder.
And generally there are lots of materials about ANS ... I have just added more of them to the list: http://encode.su/threads/2078-List-o...mplementations
You can reverse the stream either during compression or decompression. fpaqa/b/c and lpaq1a reverse it during compression.
About ANS and 1/x distribution:
i was reading Charles' blog http://cbloomrants.blogspot.fr/2014/...ing-ans-9.html,
and there's this question about where the 1/x invariant distribution comes from. I've seen it mentioned in Jarek's
paper too. Here are some hopefully helpful insights and demonstration related to this:
There's a continuous formulation of the Charles' initial system, that helps understand the 1/x distribution
(these dynamical systems are known as 'piecewise affine dynamical system').
The original code for the discrete version would be: http://pastebin.com/NPqmsR8L
The continuous version of it is here: http://pastebin.com/RVMv1nfq
It'll show that, whatever the slope 'alpha' you chose, the limiting distribution is always 1/xln(2).
I've written the mathematical proof (along with some illustrations and explanation) here:
page 1:
page 2:
(sorry, no LaTeX yet)
Basically, 1/x is the only distribution that solves the functional equation: f(x) = a.f(a.x) for any slope a != 1
/skal
Bulat Ziganshin (15th May 2016),Cyan (18th November 2014)
Hi skal,
So the Pr(x) depends on the symbol spread.
For example Huffman can be seen as a special case of tANS: when symbols are spread in ranges of 2^k length. In this case we get uniform Pr(x) ~ const instead.
Generally there is always some divergence from the hyperbolic behavior, but it quickly drops with L for reasonable spreads.
I have tried to find this divergence in the previous paper ( http://arxiv.org/abs/0910.2724 ), but failed - the details turn out really tough.
Also for bounding deltaH, where it is essential that sum q_s =1 to make that the linear term vanish in the expansion of Kullback-Leibler, turns out to be only approximate and tough to bound (fortunately current estimates seem to work well) - there is still a lot to do at the theoretical base of ANS ... I comfort myself with the fact that situation for arithmetic coding doesn't look much better after these few decades ...
So currently I just use simple intuitive explanation like in your note (slide 14 of https://dl.dropboxusercontent.com/u/12405967/ANSsem.pdf ): the transition is
lg(x) ->~ lg(x) + lg(1/p) mod lg(b)
having three sources of chaotic behavior: asymmetry, ergodicity, diffusivity (cryptographic applications?) - the ergodicity itself strongly suggests that we should expect approximately uniform distribution for lg(x), what translates into hyperbolic distribution for x.
Other intuitive explanation is that x contains lg(x) bits of information.
Situation of probability p contains lg(1/p) bits of information.
So we should have Pr(x) \propto 1/x
So, i'm trying to step back a little and understand why ANS has less precision than say arithmetic coding.
Let's take the binary symbol case for simplicity.
I'm considering the case of 16b precision for probabilities (so, L=16), along with 16b-word I/O (that is: one reads or write uint16 only).
The code is here: http://pastebin.com/FnNK4HDc
The relative entropy (deltaH = (H-H0)/H0) looks like this (in percents) looks like this,
when generating 10^7-long binary messages with probability between 0 and 1:
And as one can see, even if relatively OK, the compression of bANS is still quite off compared to arithmetic coding
with similar precision.
Is there something i'm doing obviously wrong here? And if not, how to fix that.
Is there any magic spread-table i should be using in this very particular case of binary coding?
/skal
Hi skal,
The variant from the code is rABS (range variant of asymmetric binary system, here is Charles': http://pastebin.com/qqyLNXFA ).
So while uABS (u - uniform) is extremely accurate (Matt's fpaqa/b/c), the range variants are not (it is why they have born so late) ... unless we operate on very large x (there are some estimations in the preprint).
I see you operate on x \in I = {2^16, ... , 2^32 -1} to work on 32 bit x and renormalize 16 bits at once.
Basically, there are three ways to improve your accuracy for arithmetic formula and binary alphabet:
- switch to much more accurate uABS formulas: http://mattmahoney.net/dc/dce.html#Section_33 , or
- switch to 64bit x, e.g. x \in I = {2^48, ... , 2^64 -1} for 16bit renormalization, or
- switch to 8bit renormalization, e.g. x \in I = {2^24, ..., 2^32 -1} for 32bit x.
My previous post was written in hurry, here is addition after some thought.
So the problem of big inaccuracy of your code is that the denominator of quantization M (p[s] ~ q[s] / M) and L: the lower end of I = {L, ..., bL-1} (renormalization) are the same (2^16) - like in degenerate case here: http://encode.su/threads/2035-tANS-r...st-of-accuracy
Range variant is a very crude symbol distribution - in ranges instead of expected uniform. However, it can get as good accuracy as we want: by operating on a very high appearance number of this range ... but in your case it is just the lowest possible: second appearance.
For range variants of ANS it is essential that the denominator for quantization of probabilities M (p[s] = q[s] / M) is much smaller than L (I = {L, ..., bL-1}) - what means that we work on high appearance of the range.
To give some numbers, at the beginning of page 9 of the last preprint there is upper bound for rANS inaccuracy:
|epsilon| < M/x
where m is the denominator of quantization.
x >= L
deltaH ~ sum_s epsilon^2 / p_s
so like in page 15, we should expect general behavior
deltaH ~ M^2 / L^2 (* "alphabet size"^2)
You use huge M=2^16. Just reduce it a bit and you will get much better behavior (or use uABS instead or increase L).
And just let me if you meet disappointing behavior again - maybe it can be repaired ...
For binary alphabet you can also use tABS: use a set of coding tables for all quantized binary probabilities (e.g. k/32).
In the preprint there is written set for 5 and 16 states - allowing to work in deltaH ~ 0.01 or 0.001 bits/symbol regions correspondingly (unless probabilities are tiny - we need large number of states to handle them well).
Storing the entire set of tables in the second case is only e.g. 16x16 = 256 bytes.
Last edited by Jarek; 22nd November 2014 at 12:06.