1. Originally Posted by Cyan
...[skal's] code is basically a modified version of FSE.
But without any credit.
...
On the flip side, FSE is amply referenced in multiple places on the web. I don't think it could be erased from history at this point.

2. Even skal's name: FSC clearly points the reference ...

3. Would someone update the Facebook of Compression with the true history, please!

4. Originally Posted by Jarek
Pascal Massimino (skal) has made his code available: https://github.com/skal65535/fsc

ps. nburns, there is analytical way to "verify accuracy of a table" as I have done in the paper. For Huffman you just multiply sequence lengths by frequencies, here it is very similar, but first you have to find probability distribution of being in different states - dominant eigenvector of stochastic matrix of jumping between these states.
That's what you'd have to do to solve the full problem, making the fewest assumptions. I'm looking for other things, too, that might be more straightforward to measure or give you different information.

Huffman is a very trivial case, because the size does not depend on order, every symbol maps to a single code, and you're only dealing with integers.

ANS is a much different beast.

5. Originally Posted by Cyan
But the case of skal is a bit worse (and the reason why I'm a bit talkative today), since his code is basically a modified version of FSE.
contact them first. they may just don't know that there is a convention to give credits to authors of original work

looking at the FSC initial commit i see f.e. the PutBlock finction allowing to encode data in small reversed blocks. i don't keep eye on FSE development, is it also from your code? may be entire FSC is just reimplementation of Jarek's idea from ground up and it is the same as yours only in the things that can't be done other way? or, more probably, he was looked into your code and then reimplemented it from scratch?

i'm sorry that i don't yet looked into Jarek paper and don't understand what is your ideas and what is from the paper

6. or, more probably, he was looked into your code and then reimplemented it from scratch?
Yes that's the most likely scenario.

There are a lot of minor implementation choices which can be done in many different ways, they happen to be identical in the 2 versions.

I want to make it clear that there is no legal problem in doing so.
The code is published in BSD license for this reason, so it's meant to be re-used, and adapted, and I'm glad people do it.

Building on top of someone else's code is fine.
But the normal and ethical way is to simply name its source, and if possible explain the differences.

7. Originally Posted by Bulat Ziganshin
i'm sorry that i don't yet looked into Jarek paper and don't understand what is your ideas and what is from the paper
I'm not sure if you've grokked the "big picture" yet, but let me take a shot at explaining it, for the benefit of any newcomers, at least:

ANS is conceptually based on two functions, rank() and reduce_precision(), with inputs and outputs like so:

S: string of weighted symbols
r: integer
rank: S -> r

i,j: integers
T,U: stacks
reduce_precision: i * T -> j * U

rank() computes a rank for a string of symbols from a weighted alphabet. It can be written as a recurrence:

Code:
q,r: integer ranks
s: symbol
f: q * s -> r

// f: compute a rank from a rank and an additional symbol
int f(int r, symbol s) {
//...
}

// rank: compute the rank of a string
int rank(string S) {
length <- |S|
if(length == 0) {
return FIRST_RANK
}
else {
S' <- take_prefix(S, length-1)  // make a new string from first length-1 symbols of S
s <- last_symbol(S)
return f(rank(S'), s)
}
}
reduce_precision() removes the least-significant bit of an integer and saves it onto a stack. The result is a shorter integer and a modified stack.

Code:
int reduce_precision(int i, stack T) {
push(low_bit(i), T)
j = shift_right(i, 1)
return j
}
If there is no restriction on the size of an integer, the encoding is simply the rank:

Code:
// Encoder on an machine with infinite precision:
print rank(S)
But as a rule, any interesting string will exceed the size of a machine word. Function f always increases the rank, so we need a way to detect when the rank is about to overflow:

Code:
q: integer rank
s: symbol
c: boolean
would_overflow: q * s -> c

would_overflow = true if f(q,s) exceeds a predefined limit, false otherwise
When a potential overflow is detected, we apply reduce_precision as many times as necessary, then continue with the usual rank calculation (rewritten as a loop):

Code:
r: integer rank
T: stack

r <- FIRST_RANK
T <- EMPTY_STACK
while((s = read()) != EOF) {
while(would_overflow(r, s)) {
r = reduce_precision(r, T)
}
r = f(r, s)
}
// the output is the final rank and the contents of the stack:
print r, T
This is an implementation of ANS at a high level.

Since reduce_precision() saves the truncated bits to a stack, and the stack is part of the output, it can be inverted. The f function is also carefully chosen to have an inverse, so the result of the algorithm is decodeable. The size of the output in the overflow-conscious algorithm (ANS) is close to the size of the exact rank(S). rank(x) is chosen to return an integer approximation of the entropy of x, for all x; therefore, ANS achieves compression close to the theoretical bound.

Since the intermediate rank is the only state maintained by the algorithm, it can be thought of as a finite state machine with a state for every possible rank.

8. ## Thanks:

Bulat Ziganshin (9th July 2014)

9. I based the final encoder pseudocode on what I've seen done, e.g. in fpaqc, for the ANS primer I posted above. After writing it, I gave it some thought, and I realized that you can construct situations that cause that algorithm to fail: you can cause it to wind up in a state that's invalid due to being outside of the range.

I went back and looked at Jarek's original paper, where he proves the validity of the algorithms, and I found that his proof uses reduce() in a different way than the one I gave here (to use it the proved way, you'd have to do something like jump to a too-large state unconditionally and then reduce back into range.) Has anyone else dealt with this?

Edit:
I changed the ANS primer to use the known-invertible algorithm.

Edit:
Changed back, reflecting the fact that fpaqc tested OK.

10. Well, I finally got around to trying to demonstrate the problem I described in my last post. Using fpaqc as subject, I exhaustively tested all possible combinations of probability and state. I found no errors in fpaqc.

I also played around with the predictor, and I seem to have made it faster. Here's the patch:

--- fpaqc.cpp	2007-12-25 17:45:34.000000000 -0800
+++ fpaqc/fpaqc.cpp	2014-07-03 22:01:34.677510612 -0700
@@ -67,8 +67,12 @@
}

void update(int y) {
+#ifndef FASTPRED
if (y) t[cxt]+=65536-t[cxt]>>5;
else t[cxt]-=t[cxt]>>5;
+#else
+		t[cxt] = t[cxt] - (t[cxt] - y >> 5) + (-y & 0x7ff);
+#endif
if ((cxt+=cxt+y)>=512) cxt=1;
}
};


11. ## Thanks (2):

Bulat Ziganshin (15th May 2016),Matt Mahoney (9th July 2014)

12. Pascal has added 4x interleaving and alias rANS to his implementation:
https://github.com/skal65535/fsc

13. ## Thanks:

Cyan (4th November 2014)

14. Originally Posted by Jarek
Pascal has added 4x interleaving and alias rANS to his implementation:
https://github.com/skal65535/fsc
I'm having a hard time figuring out how to use it. I'm not sure what he means by words. Are those words in the linguistic sense, or words as in fixed-size chunks of data larger than bytes?

With no command-line options, it works on enwik7.bwts.mtf, but on enwik8.bwts.mtf and enwik9.bwts.mtf it fails in the table-building routine with a generic error message. Am I misusing it somehow or is that a bug?

15. "words as in fixed-size chunks of data larger than bytes". From the readme file:

The word-based coding methods (CODING_METHOD_16B and up) will use b=2^16 and write 16b-words at a time. They also have variants with interleaving and alias method.

16. Originally Posted by hexagone
"words as in fixed-size chunks of data larger than bytes". From the readme file:

The word-based coding methods (CODING_METHOD_16B and up) will use b=2^16 and write 16b-words at a time. They also have variants with interleaving and alias method.
I think I saw that. So can we conclude that by words he means multi-byte chunks? Part of my point was my feeling that the documentation is unclear, and I still feel that way. More pointedly, I tried throwing some basic test data at it and it failed for reasons that are still obscure. What do I need to do to test with enwik9.bwts?*

*Note that enwik9.bwts ran fine, but the compression was no different than plain enwik9 and looked to match approximately the 0-order entropy of the whole file (roughly what you'd expect from, e.g., Huffman, using whole-file symbol frequencies). To appropriately compress the BWT(S) it either needs to implement some kind of local adaptation internally or accept something like MTF-transformed data (which failed for enwik8+).

17. Hi everybody,

oh, wow, i didn't know 'fsc' was known outside of my conversation with Jarek about ANS! Thanks for your interest.
To be clear, it's just a brute-force non-adaptive entropy coder experiment, and will probably stay so.
It was mostly written to quickly experiment with alternate spread-functions ('modulo', 'pack', 'reverse', but
also Jarek's bucket-sorting one, e.g.), as well as trying to understand ANS underlying principles.

My apologies to Yann, i didn't realize FSE was his, i initially thought "Cyan" was Jarek's nickname (and FSE
his code). My bad! Updated the README.md with full credits, as due!

I think i now have clearer ideas about ANS. I really perceive it as an elegant dynamic system, more than an arithmetic
coder, that relies on few general principles (like: "state 'x' can encode ln(x) bits of information"). Quite novel and intriguing.

Some precisions/remarks:

* so, yes, 'fsc' is a simple tool to experiment. Usage is simple:

./fsc < input_file > compressed_file
./fsc -d < compressed_file > reconstructed_file
diff reconstructed_file input_file || die

* With time, i ended up implementing some of Ryg's and Charles' ideas too:
- interleaving (2x, 4x)
- word-based output (and not per-bit): i sticked to 16b because i'm mostly interested in ARM and mobile, where 64b calc is not a good idea...
- alias method
- specialized binary version (bANS?) using 32b-based output and 32b precision for probas
- and a 'pure' arithmetic coding version too, for comparison with the bANS (here too 32b precision, 32b output). Just run './bit_cmp'.
-...

* So far i can tell:
- bANS is not necessarily faster than arith coding
- arith coding is usually more precise
- it's hard to improve precision (deltaH) in ANS. One need to fiddle with the spread functions quite some, in sometimes surprising ways...
- multi-symbol is very easy (and fast!) with ANS, much easier than arith coding. That's a great selling point.
- interleaving speed-up is yummy too!
- reverse-direction coding in ANS is a pain. Using block-based coding is ok, but you lose a little bytes at the end of each block to store fine state. And it makes adapting probabilities (and context modeling) harder, resorting to multi-pass block-coding (-> speed penalty).
- all the tricks (alias, interleaving, 16b-output, spread-tables) are 'baked in' in the final bitstream. Means for instance you can use a coder with 16b-based output on a mobile phone and decode the bitstream with a 64b-based reader. Or use the alias method on decoder but not on the encoder... Some platform tradeoffs are needed i guess.
- i'm unsure how hardware-friendly ANS is overall. Probably better than Huffman, agreed.

That's it. i overall quite like ANS and will still be fiddling around as time permits...

thx!
/skal

18. ## Thanks (2):

Bulat Ziganshin (14th November 2014),Jarek (13th November 2014)

19. Hi skal,

1) regarding storing the final state, you can compensate it by storing some information in the initial state.
For example use initial state L+s_0 where s_0 is the first symbol - it can be decoded when the bitstream has dried out.
More sophisticated (useful for rANS) is starting from state x=1 and using a few times rANS (or uABS) formulas - until you get to I={L,...,bL-1}.

Alternatively, we can use this state as a checksum. Fix initial state while encoding, and if there was an error while decoding, we will start wandering randomly - most probably getting to a different state, indicating an error.

2) Regarding approximated binary case (I usually referred as ABS) - let's look at CABAC M-coder from h.264 ( http://iphome.hhi.de/marpe/download/...arpe_et_al.pdf ):

R_LPS = RTAB[m][(R > > 6) & 3];
R_MPS = R - R_LPS;
if (V < (R_MPS < < BitsLeft))
{R=R_MPS; value=valMPS;
Rnorm = (R_MPS > > 8 ) XOR 1;}
else
{V = V – (R_MPS < < BitsLeft)
R = R_LPS; value = !valMPS
Rnorm = RnormTAB[R_LPS > > 3];}
R = R << Rnorm;
BitsLeft = BitsLeft - Rnorm;
if (BitsLeft < = 0)
{V = (V < < M_BITS) | read_bits(M_BITS);
BitsLeft = BitsLeft + M_BITS;}

in comparison, decoding step for tABS is just (m is quantized probability):
t = DecodingTable[m][x];
value = valMPS XOR t.symbol;

No branches, no bit-by-bit renormalization, no multiplication, single number state e.g. for SIMD ... direct handling of large alphabet - no binarization.

However, for extremely accurate applications like ultracompressors, indeed ABS probably does not bring any advantages.

3) Regarding accuracy for large alphabet.
For rANS you can get as good as you want using 64 bit.
For tANS it is mostly a matter of large enough L. Especially such that all symbols have probability above 1/(L sqrt(2)).
Regarding symbol spread, I have managed to get the tuning: spread using not only quantized probabilities (q_s ~ L * p_s), but also the actual ones (p_s) - see: https://github.com/JarekDuda/Asymmet...SystemsToolkit

4) Regarding inconvenience of backward encoding for e.g. adaptive situations, it is just a matter of using a buffer: storing a sequence of pairs: (s_i, D_i) where s_i is the i-th symbol, which is from distribution number D_i.
Then encode this buffer in backward direction (table[x] -> table[D_i][x]), now decoding is straightforward.
Here is JamesB implementation of order1 rANS: http://encode.su/threads/2017-LzTurb...ll=1#post39920

5. Regarding comparison with Huffman, tANS requires a bit more memory per L. However, smaller L can bring smaller deltaH due to handling fractional bits - see slide 5 of https://dl.dropboxusercontent.com/u/12405967/ANSsem.pdf
Big hardware advantage is no need for costly tree building - initialization is much simpler and cheaper.

Best,
Jarek

20. Jarek,

thanks for the very subtle hint #1 !!

I implemented it, and it works well at recovering some of the left-over bytes for each block.
It's a little bit tricky with interleaving, though. I'll update the fsc code when i'm finished dealing
with corner cases...

/skal

21. In the course of trying to better understand ANS (and LaTeX), I used LaTeX to typeset some of the ANS formulas in concise form. Representing them in plain ASCII is terrible, and they can be hard to find in their respective papers. Seeing the formulas by themselves, properly typeset, is somewhat easier on the eyes and is good for reference (and interior decorating). I also played with various different notation and variable naming; you may like my choices, or you may not. I've attached uABS and rANS along with their respective LaTeX sources.

Please let me know promptly if you find any errors.

Edit:
Both are corrected.

22. ## Thanks (2):

Bulat Ziganshin (15th May 2016),Paul W. (14th November 2014)

23. I was wondering if it is possible to apply also rANS algorithms to guess the bit compression (like fpaq and not fpaqA) and whether it is necessary to encode at the back even the bits!

24. Originally Posted by Nania Francesco
( http://encode.su/threads/2078-List-o...mplementations ) I was wondering if it is possible to apply also rANS algorithms to guess the bit compression (like fpaq and not fpaqA) and whether it is necessary to encode at the back even the bits!
I am not certain I understand the question.
If you are asking if rANS can be applied to a binary alphabet, sure but there are probably no advantages (interleaving?) - here is Charles' comparison: http://pastebin.com/qqyLNXFA

Regarding getting rid of backward encoding, it is a very good question.
I was not able to do it for ANS, but maybe it can be done for a different approach ... ?
And yes - entropy coding automaton can be also made by approximating arithmetic coding (having forward encoding) - getting e.g. so called quasi arithmetic coding: http://www.ittc.ku.edu/~jsv/Papers/HoV93.qtfull.pdf
However, as range coding requires two numbers to represent the current state, the number of states of such automata would grow much faster than for ANS (~ L^2) - if we would like a better precision or larger alphabet.
And so e.g. h.264 uses a different approach: "M-coder" (see listing above).

If you are asking about wasting bits for storing the final state, we can compensate it by storing some information in the initial state - see my #1 answer 2 posts above.

25. I meant that for absurd there would be need a bit version [0 and 1 symbols] (comparing the rANS for multiple symbols) to have the range but a code x (rans) comparable. The code might be this:
If you are asking if rANS can be applied to a binary alphabet, sure but there are probably no advantages (interleaving?) - here is Charles' comparison: http://pastebin.com/qqyLNXFA
if ( x < (1<<32) )
{
x <<= 32;
x |= *dword_in_ptr++;
}

uint64 xm = x & 4095;

if ( xm < p0 )
{
x = p0 * (x >> 12) + xm;

p0 += (4096 - p0) >> 5;

return 0;
}
else
{
x = (4096 - p0) * (x>>12) + xm - p0;

p0 -= p0 >> 5;

return 1;
}
but to compress change anything with this version, should I compress in reverse?

26. I have just linked you something like that (rABS): http://pastebin.com/qqyLNXFA
And written that (alone) it probably does not bring any advantages ... (interleaving?)

However, it indeed can be useful for adding binary symbols to rANS stream ... you can use also the uABS formulas instead (fpaqa/b/c).

27. Thanks, I did not understand only if the encoding is asymmetric as in rANS?

28. The "asymmetric" word of ANS refers to asymmetrization of standard numeral systems: which are optimal for uniform probability distribution of digits.
If you refer to reverse encoding, the only forward encoded ANS family I know are prefix codes (but I haven't spent much time on looking), which decoder can be seen as a special case of tANS decoder.

And generally there are lots of materials about ANS ... I have just added more of them to the list: http://encode.su/threads/2078-List-o...mplementations

29. You can reverse the stream either during compression or decompression. fpaqa/b/c and lpaq1a reverse it during compression.

30. About ANS and 1/x distribution:

i was reading Charles' blog http://cbloomrants.blogspot.fr/2014/...ing-ans-9.html,
and there's this question about where the 1/x invariant distribution comes from. I've seen it mentioned in Jarek's
paper too. Here are some hopefully helpful insights and demonstration related to this:

There's a continuous formulation of the Charles' initial system, that helps understand the 1/x distribution
(these dynamical systems are known as 'piecewise affine dynamical system').
The original code for the discrete version would be: http://pastebin.com/NPqmsR8L
The continuous version of it is here: http://pastebin.com/RVMv1nfq

It'll show that, whatever the slope 'alpha' you chose, the limiting distribution is always 1/xln(2).

I've written the mathematical proof (along with some illustrations and explanation) here:

page 1:

page 2:

(sorry, no LaTeX yet)

Basically, 1/x is the only distribution that solves the functional equation: f(x) = a.f(a.x) for any slope a != 1

/skal

31. ## Thanks (2):

Bulat Ziganshin (15th May 2016),Cyan (18th November 2014)

32. Hi skal,
So the Pr(x) depends on the symbol spread.
For example Huffman can be seen as a special case of tANS: when symbols are spread in ranges of 2^k length. In this case we get uniform Pr(x) ~ const instead.
Generally there is always some divergence from the hyperbolic behavior, but it quickly drops with L for reasonable spreads.
I have tried to find this divergence in the previous paper ( http://arxiv.org/abs/0910.2724 ), but failed - the details turn out really tough.
Also for bounding deltaH, where it is essential that sum q_s =1 to make that the linear term vanish in the expansion of Kullback-Leibler, turns out to be only approximate and tough to bound (fortunately current estimates seem to work well) - there is still a lot to do at the theoretical base of ANS ... I comfort myself with the fact that situation for arithmetic coding doesn't look much better after these few decades ...

So currently I just use simple intuitive explanation like in your note (slide 14 of https://dl.dropboxusercontent.com/u/12405967/ANSsem.pdf ): the transition is
lg(x) ->~ lg(x) + lg(1/p) mod lg(b)
having three sources of chaotic behavior: asymmetry, ergodicity, diffusivity (cryptographic applications?) - the ergodicity itself strongly suggests that we should expect approximately uniform distribution for lg(x), what translates into hyperbolic distribution for x.

Other intuitive explanation is that x contains lg(x) bits of information.
Situation of probability p contains lg(1/p) bits of information.
So we should have Pr(x) \propto 1/x

33. So, i'm trying to step back a little and understand why ANS has less precision than say arithmetic coding.
Let's take the binary symbol case for simplicity.
I'm considering the case of 16b precision for probabilities (so, L=16), along with 16b-word I/O (that is: one reads or write uint16 only).

The code is here: http://pastebin.com/FnNK4HDc

The relative entropy (deltaH = (H-H0)/H0) looks like this (in percents) looks like this,
when generating 10^7-long binary messages with probability between 0 and 1:

And as one can see, even if relatively OK, the compression of bANS is still quite off compared to arithmetic coding
with similar precision.

Is there something i'm doing obviously wrong here? And if not, how to fix that.
Is there any magic spread-table i should be using in this very particular case of binary coding?

/skal

34. Hi skal,
The variant from the code is rABS (range variant of asymmetric binary system, here is Charles': http://pastebin.com/qqyLNXFA ).
So while uABS (u - uniform) is extremely accurate (Matt's fpaqa/b/c), the range variants are not (it is why they have born so late) ... unless we operate on very large x (there are some estimations in the preprint).

I see you operate on x \in I = {2^16, ... , 2^32 -1} to work on 32 bit x and renormalize 16 bits at once.
Basically, there are three ways to improve your accuracy for arithmetic formula and binary alphabet:
- switch to much more accurate uABS formulas: http://mattmahoney.net/dc/dce.html#Section_33 , or
- switch to 64bit x, e.g. x \in I = {2^48, ... , 2^64 -1} for 16bit renormalization, or
- switch to 8bit renormalization, e.g. x \in I = {2^24, ..., 2^32 -1} for 32bit x.

35. My previous post was written in hurry, here is addition after some thought.
So the problem of big inaccuracy of your code is that the denominator of quantization M (p[s] ~ q[s] / M) and L: the lower end of I = {L, ..., bL-1} (renormalization) are the same (2^16) - like in degenerate case here: http://encode.su/threads/2035-tANS-r...st-of-accuracy
Range variant is a very crude symbol distribution - in ranges instead of expected uniform. However, it can get as good accuracy as we want: by operating on a very high appearance number of this range ... but in your case it is just the lowest possible: second appearance.

For range variants of ANS it is essential that the denominator for quantization of probabilities M (p[s] = q[s] / M) is much smaller than L (I = {L, ..., bL-1}) - what means that we work on high appearance of the range.

To give some numbers, at the beginning of page 9 of the last preprint there is upper bound for rANS inaccuracy:
|epsilon| < M/x
where m is the denominator of quantization.
x >= L
deltaH ~ sum_s epsilon^2 / p_s
so like in page 15, we should expect general behavior
deltaH ~ M^2 / L^2 (* "alphabet size"^2)

You use huge M=2^16. Just reduce it a bit and you will get much better behavior (or use uABS instead or increase L).
And just let me if you meet disappointing behavior again - maybe it can be repaired ...

For binary alphabet you can also use tABS: use a set of coding tables for all quantized binary probabilities (e.g. k/32).
In the preprint there is written set for 5 and 16 states - allowing to work in deltaH ~ 0.01 or 0.001 bits/symbol regions correspondingly (unless probabilities are tiny - we need large number of states to handle them well).
Storing the entire set of tables in the second case is only e.g. 16x16 = 256 bytes.

Page 7 of 9 First ... 56789 Last

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•