Regarding compression rate, both tANS and rANS can work as close theoretical maximum (Shannon entropy) as you want.

Regarding understanding, the basic ANS concept is really simple:

- spread symbols: assign a symbol to each natural number in more or less uniform way with p[s] densities (a finite range like {L, ..., 2L-1} in practice),

- enumerate symbol appearances,

- now the encoding step is: to add information from symbol s to information in state/number x, go to x-th appearance of s (in analogy to x -> 2x + s in standard binary system).

Now add renormalization to stay in {L, ..., 2L-1} range and you have tANS, like Yann's FSE.

Mixing different probability distributions, also binary ones (speed like for arithmetic coding), is not a problem.

How much time LZA spends on entropy decoding? You can easily divide it by 2 or 3 ... leaving LZHAM behind.