Page 1 of 2 12 LastLast
Results 1 to 30 of 33

Thread: Asymmetric numeral system toolkit and fast tuned symbol spread

  1. #1
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    807
    Thanks
    245
    Thanked 256 Times in 160 Posts

    Asymmetric numeral system toolkit and fast tuned symbol spread

    The choice of tANS L state entropy coding automaton consists of:
    - quantization of symbol probability distribution as p[s] ~ q[s]/L fractions
    - spreading these symbols is range [0, L-1], such that s appears q[s] times
    I have created a C++ toolkit to test different methods here, compare with Huffman: https://github.com/JarekDuda/AsymmetricNumeralSystemsToolkit

    A few improvements were made on the way - the most important is finally fast "tuned" symbol spread: using inequalities in q[s] ~ p[s]/L approximations.
    So in currently used implementations, the symbol spread usually uses only q, slowly getting some basic "tuning": shifting low probable single appearance symbols to the end of range (recent FSE).
    The new spread_tuned() uses all symbol probabilities in addition to q. For example for 4 symbol and L=16 states (binary(0.2,2)):
    p: 0.04 0.16 0.16 0.64
    q/L: 0.0625 0.1875 0.125 0.625
    this quantization itself has dH/H ~ 0.011
    spread_fast() gives 0233233133133133 and dH/H ~ 0.020
    spread_prec() gives 3313233103332133 and dH/H ~ 0.015 - generally it is close to quantization dH/H
    while spread_tuned(): 3233321333313310 and dH/H ~ 0.0046 - better than quantization dH/H due to using also p
    Tuning shifts symbols right if q[s]/L > p[s] and left otherwise, getting better agreement and so compression rate.
    This compression rate income is the most significant for low L, for example when we need to operate on large number of small distributions.

    The idea behind spread_tuned() is pretty simple: we know that probability distribution of states should be approximately P(x) \propto 1/x, so we choose expected symbol appearances to get this rule. It is especially important when switching between different coding tables, but also gives improvements while using single table.
    So P(x) \propto 1/x leads to ~1/ln(2) normalization factor (harmonic numbers). Encoding to some appearance of a symbol is made from a range of values: all x such that after k steps "x->floor(x/2)", we get i\in [q[s], 2q[s]-1). Probability of such range for our assumption turns out to be ~ ln(1+1/i)/ln(2). Multiplied by p[s], it is probability of obtained state x: should be Pr(x) ~ 1/x/ln(2), finally getting:

    preferred position for i \in [q[s], 2q[s]-1) appearance of symbol s is x = 1/(p[s] * ln(1+1/i))

    The spread_tuned() creates lists of preferred symbols for each position, then spreads symbols by reading these lists in order.
    It has linear complexity, but might be too slow some applications. For speedup, only low probable symbols can be treated this way (e.g. q=1, maybe 2), and fast spread can be used for the remaining ones.
    Single appearance at the end of table corresponds to p ~ 1/(L*2*ln(2)) ~ 0.72/L. it is the lowest probability we can represent well - if you have lots of lower probability symbols, consider grouping them into an escape symbol, decoded in eventual succeeding step (should be rare - nearly does not affect speed).

    Another new function is quantize_prec() which finds the best quantization accordingly to sum_s (p[s] - q[s]/L)^2/p[s] approximation of dH. Unfortunately it is n*log(n) complexity.
    There is also quantize_fast() which after q[s] = round(p[s]*L), just uses the most probable symbol to repair the balance - sometimes it does not work(!).
    There is also new fast method to propagate probability distribution of states (make_step()) - it is low memory and usually quickly converge to stationary probability distribution - required to get hANS (average number of bits/symbol we use). This convergence ("power method") could be replaced by a faster direct method to find the dominant eigenvector, but it would require storing the whole matrix.

    Feel free to add more probability distributions, better quantizers and spreads (and let me know if it crashes for some reasonable parameters).
    Another important thing to add is quantization/compression of probability distribution, which often has to be stored in the header.
    Last edited by Jarek; 27th July 2014 at 07:48.

  2. Thanks (3):

    Cyan (26th July 2014),nburns (27th July 2014),valdmann (11th August 2014)

  3. #2
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    889
    Thanks
    483
    Thanked 279 Times in 119 Posts
    Does your improved distribution model (spread_tune)
    require some more information (within header) to be properly described, hence reproduced on the decoder side,
    which could, potentially, offset its DeltaH advantage ?

  4. #3
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    807
    Thanks
    245
    Thanked 256 Times in 160 Posts
    Yes, it uses both found quantization (q) as the number of appearances, and the real probabilities (p) - at least some its approximation.
    Using p~q/L approximation we should get close to the precise initialization - there is a subtle difference between them:
    - tuned: the preferred position of i=q[s] .. 2q[s]-1 appearance of symbol s is: 1/(p*ln(i+1/i))
    - precise: in analogy, the preferred position can be i/p
    as ln(1+1/i) = 1/i - 1/(2i^2) + .. while expanding to the first term, both formulas are the same. As expected, the full logarithmic formula seems to behave a bit better (new spread_tuned_p() uses 1/(p*ln(i+1/i)) ~ i/p approximation).

    Knowing both q and p is e.g. for situations:
    - when probability distribution is fixed, e.g. we have binary variables, quantize their probabilities into e.g. 32 possibilities for the less probable symbol, and for each of them block e.g. 8 such binary variables of the same approximated probability - then we use 32 byte-wise entropy coders for each such 8 bit buffer,
    - when probability distribution is stored in the header - in this case we could store a bit more than q - it is the question of how to compress/quantize probability distributions ...

    ps. I have added spread_tuned_s() which uses sort instead of buckets - so it is slower, but should be better - surprisingly, it is often worse.
    There should be added some exhaustive search for the best symbol spread for given probability distribution.
    However, I am afraid that they may get out of expected Pr(x)~1/x stationary probability of states, what should be maintained for good statistical behavior while switching between different coding tables - spread_tuned() seems perfect for such applications.
    Last edited by Jarek; 28th July 2014 at 12:07.

  5. #4
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    807
    Thanks
    245
    Thanked 256 Times in 160 Posts
    I have added a simple scrambler to allow to simultaneously encrypt the compressed file nearly for free: disturbs the symbol spread a bit in a pseudorandom way - using PRNG initialized with cryptokey. It can be applied between spreading symbols and generating tables.
    It costs a tiny reduction of compression rate, completely negligible for larger L.

    scrambler0() in current toolkit is just (before optimizations):
    for(i = 1; i<L/2; i++) if(randombool()) switch symbols on position 2i-1 and 2i

    switching 2i and 2i+1 instead would have worse effect.
    I have managed to make the switch branch-less, what should improve performance (assumes <=256 size alphabet!):
    uint tm = ((*ps) | ((*ps)<<24) | ((*(ps+1))<<8) | (*(ps+1))<<16) >> ((t & 1) << 4); 
    *ps = tm & 255; *(ps+1) = (tm >> 8) & 255; // branchless switch of *ps and *(ps+1) if(t&1)


    The behavior of ANS hidden state x is very nonlinear lg(x) -> ~lg(x) - lg(p[s]) mod lg(L) (see slide 13 and 17 ):
    - asymmetry: different symbols (p[s]) change the state in very different ways,
    - ergodicity: lg(p) is usually irrational, making that even single symbol wants to cover all states,
    - diffusion: there is additional pseudorandom distribution from the expected behavior.
    Making fallowing the state without knowledge of the exact coding (symbol spread) practically impossible.

    So after:
    - replacing PRNG with some better one having at least 64 bit initializer,
    - using L >= 1024 (for large L the switch positions can be more sparse like ps+=4 instead of 2),
    - applying it after some LZ scheme,
    I believe it should be really safe (?)

    update: added scrambler1() - takes blocks of 4 symbols and rotate them cyclically:
    uint64_t tm = ((*ps) | ((*(ps+1))<<8) | ((*(ps+2))<<16) | ((*(ps+3))<<24)); 
    tm = (tm | (tm << 32)) >> ((t & 3) << 3);
    *ps = tm & 255; *(ps+1) = (tm >> 8) & 255; *(ps+2) = (tm >> 16) & 255;*(ps+3) = (tm >> 24) & 255;

    The first and third line can be optimized by just projecting into larger variable.
    It can be increased to 8 symbol blocks e.g. by using assembler cyclic shift.
    Example and general scheme:
    Click image for larger version. 

Name:	ANScrypt.png 
Views:	163 
Size:	36.2 KB 
ID:	3321
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	ANSexample.png 
Views:	181 
Size:	116.7 KB 
ID:	3320  
    Last edited by Jarek; 17th December 2014 at 17:06. Reason: example

  6. #5
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 72 Times in 56 Posts
    Quote Originally Posted by Jarek View Post
    I believe it should be really safe (?)
    I don't necessarily see the win in coupling cryptography to compression. They have different goals, and, most likely, crypto algorithms are easier to verify when all they have to do is crypto. Crypto algorithms in current use derive their (believed) hardness from believed-hard problems like factoring, discrete logarithm, and other well-known number theory problems. Can you connect this to one of those problems?

    In principle, perfect compression is one step from perfect cryptography, because changing one bit at random should select a vastly-different message at random. So, in principle, you could make a case that ANS simplifies encryption down to choosing a small number of hard-to-recover bits (inasmuch as ANS has approximately that perfect-compression property), and that could allow it to replace a symmetric-key cipher. But it would seem more interesting to work on laying down that case in theory at this point.

    Edit:
    Thinking about perfect compression some more: I guess there's no ironclad reason why a small change should necessarily be vastly amplified in a (approximately-)perfect-compression scenario. It would depend on how the mapping from plaintext->compressed text was organized. For instance, if you were to enumerate the reverse mapping, would the plaintext message change smoothly or randomly as you smoothly varied the compressed text?

    P.S. I just noticed that it's useful to have terms like plaintext+ciphertext to express the same concept relative to compression: plaintext+compressedtext?
    Last edited by nburns; 3rd September 2014 at 00:25.

  7. #6
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,497
    Thanks
    26
    Thanked 132 Times in 102 Posts
    Paraphrasing someone else (on this forum?) - cryptographic strength of an algorithm can be measured by number of failed attempts of breaking it, assuming that all attempts failed. So if you persuade cryptography experts to try to break your algorithm and see them fail, then you can be proud of yourself :] The widespread usage would depend on experts' opinions anyway. And this forum isn't frequented by cryptography experts, I think.

  8. #7
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 72 Times in 56 Posts
    Quote Originally Posted by Piotr Tarsa View Post
    Paraphrasing someone else (on this forum?) - cryptographic strength of an algorithm can be measured by number of failed attempts of breaking it, assuming that all attempts failed. So if you persuade cryptography experts to try to break your algorithm and see them fail, then you can be proud of yourself :] The widespread usage would depend on experts' opinions anyway. And this forum isn't frequented by cryptography experts, I think.
    That formula doesn't take into account the seriousness of the attempts. For algorithms that are considered safe for general use, the standard is more like the number of years that it has survived being a high-profile crypto algorithm. If it's high-profile in the crypto world, pretty much everyone on earth with the required skillset is trying, has tried, or would like to try to break it.

    Edit:
    & of course, before skilled people will even bother, there needs to be a very good story for why it might work.
    Last edited by nburns; 3rd September 2014 at 00:49.

  9. #8
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    807
    Thanks
    245
    Thanked 256 Times in 160 Posts
    Quote Originally Posted by nburns View Post
    I don't necessarily see the win in coupling cryptography to compression. They have different goals, and, most likely, crypto algorithms are easier to verify when all they have to do is crypto. Crypto algorithms in current use derive their (believed) hardness from believed-hard problems like factoring, discrete logarithm, and other well-known number theory problems. Can you connect this to one of those problems?
    Sure RSA is based on factoring problem, which is believed (!) to be hard ... while, surprisingly, 12 years age primality test has turned out polynomial ...
    However, I am talking about symmetric-key cryptography like AES, which use XOR and permutations only - so many phases until "it seems safe" ...
    Like there are card tricks when you perform many simple operations to finally get a trivial result, there can also e.g. subtle hidden invariants in such AES-like logical puzzle - some shortcuts and supercomputers specialized in finding them.
    There is a simple way to prevent that - put some strongly nonlinear operation inside (chaos), and ANS has perfect properties for this purpose (asymmetry, ergodicity and diffusivity above) - e.g. could replace permutation S-box.
    In principle, perfect compression is one step from perfect cryptography, because changing one bit at random should select a vastly-different message at random.
    Indeed they perfectly come together: data compression removes redundancy - internal dependencies which could be used for attacks.

    Try to attack the simplest attempt: tANS with scrambler used as entropy coder for simple text.
    You get a sequence of bit - nearly white noise, which, in contrast to block ciphers like AES, you cannot even divide into bit blocks corresponding to succeeding steps - their sizes strongly depend on the current state, which behaves in chaotic way.
    If you don't have complete information, thanks to chaosity, you will nearly immediately lose all information about the state.

    Now add some additional mechanism to make eventual attack more difficult, like:
    - use some compression before to reduce redundancy which could be used for attacks,
    - each data block can use different PRNG initialization: e.g. concatenated cryptokey with the block number - making that you would have to attack each block separately, making hopeless any statistical approach,
    - add some simple additional phase(s), like final XOR with values generated from cryptokey (AES itself is multiple phases),
    - ...
    I am not saying that pure ANS is cryptographically safe, only that it suits as one of block (the nonlinear one) of cryptosystems safer than current based on only linear operations.

    And it is for practically free while compression - allowing for extremely fast archiving with protection included.

  10. #9
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 72 Times in 56 Posts
    Quote Originally Posted by Jarek View Post
    Sure RSA is based on factoring problem, which is believed (!) to be hard ... while, surprisingly, 12 years age primality test has turned out polynomial ...
    However, I am talking about symmetric-key cryptography like AES, which use XOR and permutations only - so many phases until "it seems safe" ...
    Like there are card tricks when you perform many simple operations to finally get a trivial result, there can also e.g. subtle hidden invariants in such AES-like logical puzzle - some shortcuts and supercomputers specialized in finding them.
    There is a simple way to prevent that - put some strongly nonlinear operation inside (chaos), and ANS has perfect properties for this purpose (asymmetry, ergodicity and diffusivity above) - e.g. could replace permutation S-box.
    I read that cryptographic hashes like SHA-1 are actually created by something like a lot of ad hoc stirring up of bits rather than being designed around proofs of their equivalence to hard problems. That surprised me. I'm not sure whether that applies to symmetric-key algorithms also. I could be in for another surprise.


    Indeed they perfectly come together: data compression removes redundancy - internal dependencies which could be used for attacks.

    Try to attack the simplest attempt: tANS with scrambler used as entropy coder for simple text.
    You get a sequence of bit - nearly white noise, which, in contrast to block ciphers like AES, you cannot even divide into bit blocks corresponding to succeeding steps - their sizes strongly depend on the current state, which behaves in chaotic way.
    If you don't have complete information, thanks to chaosity, you will nearly immediately lose all information about the state.

    Now add some additional mechanism to make eventual attack more difficult, like:
    - use some compression before to reduce redundancy which could be used for attacks,
    - each data block can use different PRNG initialization: e.g. concatenated cryptokey with the block number - making that you would have to attack each block separately, making hopeless any statistical approach,
    - add some simple additional phase(s), like final XOR with values generated from cryptokey (AES itself is multiple phases),
    - ...
    I am not saying that pure ANS is cryptographically safe, only that it suits as one of block (the nonlinear one) of cryptosystems safer than current based on only linear operations.

    And it is for practically free while compression - allowing for extremely fast archiving with protection included.
    Here's the thing: even if you were able to convince me that this is workable with hand-waving arguments, I, myself, would not bother trying to make the case to people that actually matter until I had something approaching a proof. What the proof should be of may require some creativity, but this needs to be made more formal before you should expect to be taken seriously.

  11. #10
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    807
    Thanks
    245
    Thanked 256 Times in 160 Posts
    There is probably only one cryptosystem which can be mathematically proven to be safe: http://en.wikipedia.org/wiki/One-time_pad
    The rest is belief e.g. that given number of phases does not longer seem easy to crack ...

    I am only saying not to make life of e.g. NSA people easier, and replace some linear operation inside, which often happen to have some invariants when combined, with a chaotic nonlinear one ...

  12. #11
    Member
    Join Date
    Jun 2013
    Location
    USA
    Posts
    98
    Thanks
    4
    Thanked 14 Times in 12 Posts
    When it comes to secret-key cryptography, two things come to mind.

    1: The permutation must be hard to break.
    2: Must be immune to timing attacks.

    I am not well versed in ANS but I can say that as long as the compression or cryptography stages do not run in constant time, there is a security problem. Most implementation of AES for example do not run in constant time. This was used to obtain a key over the network a while back. http://cr.yp.to/antiforgery/cachetiming-20050414.pdf

    Is it possible to replace the "scrambler" with the sha512 permutation for example?

  13. #12
    Member
    Join Date
    May 2008
    Location
    brazil
    Posts
    163
    Thanks
    0
    Thanked 3 Times in 3 Posts
    I think cryptography is different from compression because cryptography needs to know the correct(or approximate) value of data.And compression just need to focus in compress data . It's not good to mix these 2 things.Cryptography have a lot of complex factors.(timing,hardware and others).

    In my opinion ,the only thing random and arbitrary is big bang(nothing have created something).The rest have a lot of determinism and prediction.

  14. #13
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    807
    Thanks
    245
    Thanked 256 Times in 160 Posts
    Mangix,
    While comparing with permutation, the advantage of tANS step is that it has memory: the state changing in chaotic (unpredictable) way, which determines the current behavior - also the varying number of bits to produce in this step.

    Regarding timing attacks, sure behavior of entropy coder depends on if you feed it with more of less frequent symbols. However, this shouldn't be an issue while creating password protected archives. And even if one would determine the symbol spread used, using safe PRNG to generate it means that he couldn't get any information about the used cryptokey from this spread. So using e.g. "cryptokey+block number" to initialize PRNG for succeeding data blocks (e.g. 32kB in compressors), would make them practically independent from attacker point of view.
    Example of a more sophisticated ANS-based encryption only (no compression) is: use PRNG initialized with key to choose some (intermediate) pseudorandom probability distribution e.g. on 256 size alphabet, then generate two scrambled tANS for this probability distribution. Now encoding step is: decoding step transforms some input bits into a symbol in this probability distribution, then immediately use encoding step to transform this symbol into a new output bit sequence. As the attacker does not even know the used intermediate probability distribution, I don't see a way for e.g. timing attack here(?)

    Regarding the choice of scrambler for tANS - it should permute symbols in a pseudorandom way.
    If using with fast spread (just randomly spreading the symbols), the permutation can be any (e.g. PRNG can be directly built in the spread).
    If using a more sophisticated spread (getting a bit smaller inaccuracy penalty, what can be also obtained by increasing the number of states), the random permutation should be rather local to maintain statistical structure - e.g. random cyclic shift in blocks of succeeding 4 symbols in scrambler1().
    Last edited by Jarek; 3rd September 2014 at 12:41.

  15. #14
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 797 Times in 489 Posts
    I can see right away that this encryption is not secure. In a chosen plaintext attack, I can swap the frequency distribution of two adjacent symbols, test whether the compressed size increases or decreases, and therefore quickly determine one bit of the key. Then I can repeat for the other key bits.

    You can argue that chosen plaintext attacks are not relevant in most situations. But every other cryptosystem is secure against them. Why should I use something weaker? And yes, I realize you can fix this using a more complicated scheme, and then I will have to spend more time to break it. But I'm not going to play that game, and neither will crypto experts who know a lot more about the subject than me. I suggest studying existing crypto before trying to invent your own.

  16. #15
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    807
    Thanks
    245
    Thanked 256 Times in 160 Posts
    Quote Originally Posted by Matt Mahoney View Post
    I can see right away that this encryption is not secure. In a chosen plaintext attack, I can swap the frequency distribution of two adjacent symbols, test whether the compressed size increases or decreases, and therefore quickly determine one bit of the key. Then I can repeat for the other key bits.
    Whow, I completely don't see how you get one bit of cryptokey this way??? There is an extremely complicated pseudorandom dependency between key and code length.
    What you could try by a chosen plaintext attack to pure tANS encoding, is trying to get the used symbol spread, but it would be much more complicated and we are not talking about pure tANS, but at least behind some LZ scheme.
    Assume that you have managed to do it: find the used symbol spread - assuming this symbol spread was generated using a safe PRNG generated with cryptokey key (K4 in BSI), this symbol spread still gives you absolutely no information about the cryptokey(!)
    So assuming that we use "cryptokey + block number" to initialize symbol spread for each e.g. 32kB data block, maybe using chosen plaintext attack you would be able to determine symbol spread for given block - which, using safe PRNG, would be completely independent from symbol spreads for other blocks.

    And we are talking about password protected archives application here - you give the password in the moment you want to compress or decompress something. It needs to be protected against ciphertext only attacks - that having only the archive, one cannot decrypt it without the password. Chosen plaintext attacks is for different applications ...

  17. #16
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    807
    Thanks
    245
    Thanked 256 Times in 160 Posts
    Ok, let's start with trying to understand the ugliest abstract scenario:
    - pure tANS e.g. with fast Yann's spread and the attacker knows probabilities (the number of symbol appearances),
    - he can perform adaptive chosen plaintext attack: can feed such initialized tANS with any symbol sequence and can observe the output,
    - he knows the initial state (which in practice should be encrypted using the cryptokey!),
    - the encoder works on relatively large blocks (e.g. 32kB).
    Even in such completely unrealistic scenario, getting the exact symbol spread (which still doesn't give any information about cryptokey!) would require testing billions of hypotheses.
    E.g. looking at agreement of the last bits of the state with produced sequence, he gets some probability distribution of the number of bits used in this first step ... and each succeeding steps makes it more and more complicated.

    Can anyone present an attack for above idealized situation when the attacker does not know the initial state for decoding (it is encrypted)?

  18. #17
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 797 Times in 489 Posts
    If you make symbol swapping decisions based on a secure keystream rather than a fixed key, then it should be secure. But why would this be better than a simple XOR with the compressed output? XOR doesn't cost any more and doesn't increase the size at all.

  19. #18
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    807
    Thanks
    245
    Thanked 256 Times in 160 Posts
    Indeed we could directly use secure PRNG initialized with cryptokey to generate a bit sequence to XOR with.
    One advantage of ANS here is that we need much less values: only to generate the coding function.
    Another, that we have simultaneously entropy coding.

    But it is also much better from security reason.
    In chosen plaintext attack you could just XOR with your plaintext to get the keystream there - in contrast, for ANS you would need much more effort to obtain the symbol spread this way.
    And if one don't have the initial state (was encrypted e.g. by XOR with keystream), finding the symbol spread already seems a really difficult task - still talking about extreme scenario: chosen plaintext for pure single ANS ... while AES needs many phases to "seem safe".

  20. #19
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 797 Times in 489 Posts
    Successful cryptanalysis involves very subtle and non-obvious attacks. Look up MD5 and WEP. It is not at all obvious how they could be attacked, and yet they were. You can't just argue that some system is secure because there is no such thing as provable security. And especially your arguments will look weak if you don't have a crypto background.

    Anyway I suggest reading some books on cryptography. "Practical Cryptography" by Ferguson and Schneier is a good introduction. "Cryptography - Theory and Practice" by Stinson goes into deeper mathematics. Then read up on the standards like AES, SHA-256, SHA-3.

  21. #20
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    807
    Thanks
    245
    Thanked 256 Times in 160 Posts
    I am not saying that tANS is some ultimate encryption method, only that it has very nice properties for this purpose (internal state evolving in chaotic way and various length bit blocks) - e.g. to be used as one of basic building blocks of cryptoystem - finally some strongly nonlinear one ... what seems to be poorly explored side of symmetric-key cryptography: usually based on XOR and permutations.

    Also, it is allows for extremely fast password protected compression, which seems really safe for practically the only available attacks here: ciphertext only.
    I imagine using it by running the compressor with some flag, like "-e pswd", while for e.g. chosen plaintext attack, one would need to have an access to compressor which was already fed with the cryptokey ...
    Last edited by Jarek; 4th September 2014 at 00:32.

  22. #21
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 72 Times in 56 Posts
    I think the learning curve for crypto is probably high enough that Jarek would probably be better served by using that time in other ways -- unless he is dead set on breaking into crypto even if it means missing out on making discoveries in other fields.

  23. #22
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    807
    Thanks
    245
    Thanked 256 Times in 160 Posts
    So just try to find a successful attack and report if you succeed (or why you didn't) - even for the extreme scenario: chosen plaintext for pure single tANS to determine symbol spread only (determining cryptokey is impossible if using secure PRNG).

    Some remarks for practical applications (keystream = output of PRNG initialized with cryptokey):
    - large L, e.g. 2048 as default in FSE,
    - initial state chosen using keystream (works also as checksum, only if you know the key), final state stored encrypted, e.g. XOR with keystream,
    - don't use pure tANS: e.g. hide it behind some LZ, or e.g. XOR succeeding blocks of output with e.g. 64 bits fixed value from keystream,
    - for extreme protection: change initializer between blocks: e.g. initialize PRNG with "cryptokey+block number" or "cryptokey+random number+block number" where "random number" is e.g. 64bit random value is stored directly written at the beginning of the file (the probability distribution stored in the header works in similar way).
    Any more remarks?
    Last edited by Jarek; 4th September 2014 at 10:17.

  24. #23
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 72 Times in 56 Posts
    It's easy to prove that an algorithm compresses; all you have to do it run it. Running a crypto algorithm, OTOH, doesn't prove that it provides security. Since security can't be proven, crypto algorithms require trust. This is not just a new algorithm, it's a completely new approach to encryption, so it's starting from zero in building trust. The first thing to do, I'd think, would be laying down a theoretical case for the general approach by writing theory-oriented papers and trying to get a journal to publish them.

    Getting published in a crypto journal as an outsider doesn't seem likely, so maybe you should try to find a coauthor within the field of cryptology.

    Personally, I'd rather work on other things than try to scale that mountain.

  25. #24
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    807
    Thanks
    245
    Thanked 256 Times in 160 Posts
    The peer review process is about asking a few random anonymous experts ... who don't even have to take an effort to understand the concept, not to mention deeply evaluating it. Especially when the concept is far for the mainstream, or making the previous work of given expert obsolete.
    This process is very far from the real world cryptography, which is based on the fact that millions have tried to break something and none has succeeded - so I am asking to try to develop an attack even on some extremely unrealistic case, to understand why it is far nontrivial ... and becomes virtually impossible if getting out of idealistic cases.
    And ANS has started replacing Huffman in applied compressors, so it is rather a matter of time when authors will want to add password protection option, without sacrificing speed - it is a good time to try to understand this possibility.

  26. #25
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 72 Times in 56 Posts
    If you developed the theory of ANS's chaotic behavior, that would be interesting independent of whether it's ever useful for cryptography.

  27. Thanks:

    Cyan (4th September 2014)

  28. #26
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    807
    Thanks
    245
    Thanked 256 Times in 160 Posts
    Accordingly to http://en.wikipedia.org/wiki/Chaos_theory , a system is classified as chaotic if:

    1. it must be sensitive to initial conditions;
    2. it must be topologically mixing; and
    3. it must have dense periodic orbits.

    For tANS, step is lg(x) -> ~ lg(x) - lg(1/p) mod lg(L) - it is similar to "x -> x + r mod 1 for irrational r" chaotic system.
    Click image for larger version. 

Name:	chaos.png 
Views:	185 
Size:	25.5 KB 
ID:	3122
    1) starting trajectories from two close points, they will quickly become independent,
    2) "Topological mixing (or topological transitivity) means that the system will evolve over time so that any given region or open set of its phase space will eventually overlap with any other given region." - analogously to 1
    3) feeding tANS with stochastic source, we get to ~1/x stationary probability of states - we densely cover the entire space.

    Why is it relevant for cryptography?
    Because the attacker wants to find the internal behavior of used coding - chaosity means that if he didn't get the exact one, he will get a completely different behavior.
    Beside chaotic behavior of internal state, another nice property of ANS as a building block of cryptosystem (like XOR and permutations), is varying number of produced bits - the attacker doesn't know how many bits correspond to given step.
    Last edited by Jarek; 4th September 2014 at 17:03.

  29. #27
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 797 Times in 489 Posts
    Chaos does not imply cryptographic strength. For example:

    x[i+1] = 4*x[i]*(1-x[i]);

    with key x[0] in (0,1) is chaotic, but useless as a keystream. Once I know x[i], I know the rest of the stream. In fact that is true of all discrete models of chaotic systems.

    This sci.crypt FAQ is ancient, but question 2.3 is still relevant. http://www.contrib.andrew.cmu.edu/~shadow/crypt.html

  30. #28
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    807
    Thanks
    245
    Thanked 256 Times in 160 Posts
    Sure chaotic does not automatically mean safe, but it is strongly connected with a basic requirement of cryptosystem: that any disturbance of the process should lead to useless noise.

    I have just googled "chaos cryptography" getting hundreds of books and articles, also using your logistic map, e.g. "Chaos-based cryptography" book, overview article with 544 citations, or large presentation ...
    Last edited by Jarek; 6th September 2014 at 01:25.

  31. #29
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    807
    Thanks
    245
    Thanked 256 Times in 160 Posts
    If somebody was using quantize_prec() from the toolkit, Alex Converse (Google) has noticed that it has a bug:
    https://github.com/aconverse/Asymmet...2d40c24c498a64

    ps. paper about cheap compression+encryption with tANS, e.g. for Internet of Things, medical implants, RFIDs:
    https://dl.dropboxusercontent.com/u/...7/cryptANS.pdf
    [16]-[22] are other papers about including encryption in compression: using LZ, Huffman, AC, BWT ...

  32. #30
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,572
    Thanks
    783
    Thanked 687 Times in 372 Posts
    Jarek, you understanding of cryptography is just the same as understanding of compression of the guy in the neighbor thread

Page 1 of 2 12 LastLast

Similar Threads

  1. Asymetric Numeral System
    By Cyan in forum Data Compression
    Replies: 250
    Last Post: 28th July 2020, 16:37
  2. .rec toolkit
    By Shelwien in forum Data Compression
    Replies: 0
    Last Post: 9th October 2011, 23:07
  3. Asymmetric PAQ
    By kampaster in forum Data Compression
    Replies: 11
    Last Post: 27th August 2010, 04:16
  4. SSE, TSE & higher symbol probability estimations
    By Dmitry Shkarin in forum Forum Archive
    Replies: 4
    Last Post: 6th March 2008, 18:47
  5. Symbol ranking compression
    By Matt Mahoney in forum Forum Archive
    Replies: 21
    Last Post: 29th October 2007, 16:36

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •