No, you have considered only one possib;le attack. to make sure that no attack is possible, you have to investigate all ones
No, you have considered only one possib;le attack. to make sure that no attack is possible, you have to investigate all ones
OK, so you think that in order to sacrifice security for performance I should make a formal security model, find all possible ways to attack it (maybe with a proof that there are no others) and examine them all?
That's not how it works, we all do security tradeoffs all the time based on assumptions that are much more sketchy than the one above.
And in fact I do believe that probability of me experiencing a collision *attack* is not much bigger than a risk of a random collision in a properly sized hash.
There is no such thing as provable security. The reason we think that AES, SHA-256, etc. are secure is that lots of people have been unable to attack them.
You can't treat non-crypto hashes as random permutations, even if you aren't worried about deliberate collisions. There is a simple variation of Li and Vitanyi's proof (linked in an earlier post) that over a universal distribution of inputs, average case = worst case for hash collisions, or any other failure mode or property of program inputs, not just time and space complexity. Consider any provably halting algorithm A and any property P. Then you can describe a string x as "the n'th string having property P on A" in |P| + |A| + log n + C bits, where |P| and |A| are the lengths of the descriptions of P and A in some language, and C is a constant that depends only on the language. If P and A have simple descriptions, then the problem is worse because it means that the string x has a shorter description and is therefore more likely.And in fact I do believe that probability of me experiencing a collision *attack* is not much bigger than a risk of a random collision in a properly sized hash.
Li and Vitanyi give quicksort as an example. It gives worst case behavior on already sorted data and average case on random data. Suffix sorting would be another example. Simple implementations give worst case behavior on data that is easy to compress. Better implementations like divsufsort don't have this problem but are necessarily complex. |A| has to be large.
The same problem occurs with hash collisions. You can describe "the first x,y such that H(x) = H(y)" in |H| + C bits, meaning that (x,y) has probability at least 2^-(|H| + C) in a universal distribution. This proves there are no hash functions that are both simple (less than 100-200 bits to describe) and collision resistant.
A universal distribution doesn't account for the time it takes to compute a description. In general, finding the shortest description of a string x is not computable. But if you know that if it takes a long time to find a description of a certain length, then you are unlikely to do better. For example, you can be reasonably sure that "the first x,y such that SHA256(x) = SHA256(y)" has no possible description shorter than about 256 bits, but only because a lot of people have looked.
Some preliminary tests for stream encryption speed on a 100 MB file (2 GHz T3200, 32 bit Vista, 1 thread):
RC4 - 1 sec.
AES-256-CTR - 2 sec. (libtomcrypt)
Keccak b=1600 r=256 - 4 sec (Keccak-simple.c from http://keccak.noekeon.org/)
Keccak b=800 r=128 - 2 sec.
Keccak-1600 would probably be faster under a 64 bit OS. There is optimized code using SSE2 intrinsics that is surely faster and should work just as well in a 32 bit OS. The code is not well documented so it will be awhile before I test it. I actually tested the hash function, but since it is a sponge function, stream encryption should be the same speed.
One motivation for using Keccak is I can use it for both key stretching and stream encryption to simplify the code. With AES I would have to include a hash function anyway. I already have SHA-1 implemented but I should probably use something stronger.
I will add this new SHA3 to my CHK for sure!![]()
That will be niceCurrently I can't find any implementations like fsum or md5sum that support SHA3 so I could test my code. The supplied code takes an array of bytes as input but I had to modify it to take input from a file.
Here is a quick SHA3-256 hash. It works on the test vectors in Wikipedia but can't say it works in all cases without more test vectors.
Code:/* sha3 - Compute SHA3-256 hashes of files Written by Matt Mahoney. Released to public domain. */ #define _CRT_DISABLE_PERFCRIT_LOCKS // speed up getc() and putc() in VC++ #include <stdio.h> #include <stdlib.h> #include <string.h> #include <stdint.h> #include <assert.h> // Compute a SHA3 (Keccak[r,c]) hash with rate r, lane type T. // To compute a hash, put() the input bytes and get() the result. // T should be a {8,16,32,64} bit unsigned type for r+c={200,400,800,1600}. // For SHA3-{224,256,384,512} set T=uint64_t and set r={1152,1088,832,576}. template <typename T, int r> class Keccak { public: void put(int ch); // Hash 1 byte int get(); // Retrieve next byte of hash void init(); // Reset state to compute a new hash Keccak() {init();} private: T a[25]; // Keccak hash state unsigned ptr; // next byte to get/put (range 0..r/8) enum {ABSORB, SQUEEZE} spongeState; // putting or getting? void f(); // Keccak-F function T rol(T x, unsigned ro) { // rotate left ro&=sizeof(T)*8-1; return x<<ro|x>>(sizeof(T)*8-ro); } }; // Reset hash state to accept a new input message. template <typename T, int r> void Keccak<T, r>::init() { memset(a, 0, sizeof(a)); ptr=0; spongeState=ABSORB; } // Hash 1 byte. Reset state if this is the first byte. template <typename T, int r> void Keccak<T, r>::put(int ch) { assert(ptr>=0 && ptr<r/8); assert(ch>=0 && ch<=255); assert(spongeState==ABSORB); a[ptr/sizeof(T)]^=T(ch)<<(ptr%sizeof(T)*8); if (++ptr==r/8) f(); } // Get 1 byte of hash. If this is the first byte then pad the input. template <typename T, int r> int Keccak<T, r>::get() { if (spongeState==ABSORB) { // first get()? assert(ptr>=0 && ptr<r/8); if (ptr==r/8-1) put(0x81); else { put(1); while (ptr<r/8-1) put(0); put(0x80); } spongeState=SQUEEZE; assert(ptr==0); } assert(ptr>=0 && ptr<=r/8); if (ptr==r/8) f(); int ch=(a[ptr/sizeof(T)]>>(ptr%sizeof(T)*8))&255; ++ptr; return ch; } // Compute rounds. Derived from public domain code // from http://keccak.noekeon.org/ template <typename T, int r> void Keccak<T, r>::f() { assert(ptr==r/8); static const T KeccakF_RoundConstants[] = { (T)0x0000000000000001ULL, (T)0x0000000000008082ULL, (T)0x800000000000808aULL, (T)0x8000000080008000ULL, (T)0x000000000000808bULL, (T)0x0000000080000001ULL, (T)0x8000000080008081ULL, (T)0x8000000000008009ULL, (T)0x000000000000008aULL, (T)0x0000000000000088ULL, (T)0x0000000080008009ULL, (T)0x000000008000000aULL, (T)0x000000008000808bULL, (T)0x800000000000008bULL, (T)0x8000000000008089ULL, (T)0x8000000000008003ULL, (T)0x8000000000008002ULL, (T)0x8000000000000080ULL, (T)0x000000000000800aULL, (T)0x800000008000000aULL, (T)0x8000000080008081ULL, (T)0x8000000000008080ULL, (T)0x0000000080000001ULL, (T)0x8000000080008008ULL}; T Da, De, Di, Do, Du; T BCa, BCe, BCi, BCo, BCu; T Eba, Ebe, Ebi, Ebo, Ebu; T Ega, Ege, Egi, Ego, Egu; T Eka, Eke, Eki, Eko, Eku; T Ema, Eme, Emi, Emo, Emu; T Esa, Ese, Esi, Eso, Esu; const int rounds=2*(9+(sizeof(T)>=2)+(sizeof(T)>=4)+(sizeof(T)==8)); for(int round = 0; round < rounds; round += 2 ) { // prepareTheta BCa = a[0]^a[5]^a[10]^a[15]^a[20]; BCe = a[1]^a[6]^a[11]^a[16]^a[21]; BCi = a[2]^a[7]^a[12]^a[17]^a[22]; BCo = a[3]^a[8]^a[13]^a[18]^a[23]; BCu = a[4]^a[9]^a[14]^a[19]^a[24]; //thetaRhoPiChiIotaPrepareTheta(round , A, E) Da = BCu^rol(BCe, 1); De = BCa^rol(BCi, 1); Di = BCe^rol(BCo, 1); Do = BCi^rol(BCu, 1); Du = BCo^rol(BCa, 1); a[0] ^= Da; BCa = a[0]; a[6] ^= De; BCe = rol(a[6], 44); a[12] ^= Di; BCi = rol(a[12], 43); a[18] ^= Do; BCo = rol(a[18], 21); a[24] ^= Du; BCu = rol(a[24], 14); Eba = BCa ^((~BCe)& BCi ); Eba ^= (T)KeccakF_RoundConstants[round]; Ebe = BCe ^((~BCi)& BCo ); Ebi = BCi ^((~BCo)& BCu ); Ebo = BCo ^((~BCu)& BCa ); Ebu = BCu ^((~BCa)& BCe ); a[3] ^= Do; BCa = rol(a[3], 28); a[9] ^= Du; BCe = rol(a[9], 20); a[10] ^= Da; BCi = rol(a[10], 3); a[16] ^= De; BCo = rol(a[16], 45); a[22] ^= Di; BCu = rol(a[22], 61); Ega = BCa ^((~BCe)& BCi ); Ege = BCe ^((~BCi)& BCo ); Egi = BCi ^((~BCo)& BCu ); Ego = BCo ^((~BCu)& BCa ); Egu = BCu ^((~BCa)& BCe ); a[1] ^= De; BCa = rol(a[1], 1); a[7] ^= Di; BCe = rol(a[7], 6); a[13] ^= Do; BCi = rol(a[13], 25); a[19] ^= Du; BCo = rol(a[19], 8); a[20] ^= Da; BCu = rol(a[20], 18); Eka = BCa ^((~BCe)& BCi ); Eke = BCe ^((~BCi)& BCo ); Eki = BCi ^((~BCo)& BCu ); Eko = BCo ^((~BCu)& BCa ); Eku = BCu ^((~BCa)& BCe ); a[4] ^= Du; BCa = rol(a[4], 27); a[5] ^= Da; BCe = rol(a[5], 36); a[11] ^= De; BCi = rol(a[11], 10); a[17] ^= Di; BCo = rol(a[17], 15); a[23] ^= Do; BCu = rol(a[23], 56); Ema = BCa ^((~BCe)& BCi ); Eme = BCe ^((~BCi)& BCo ); Emi = BCi ^((~BCo)& BCu ); Emo = BCo ^((~BCu)& BCa ); Emu = BCu ^((~BCa)& BCe ); a[2] ^= Di; BCa = rol(a[2], 62); a[8] ^= Do; BCe = rol(a[8], 55); a[14] ^= Du; BCi = rol(a[14], 39); a[15] ^= Da; BCo = rol(a[15], 41); a[21] ^= De; BCu = rol(a[21], 2); Esa = BCa ^((~BCe)& BCi ); Ese = BCe ^((~BCi)& BCo ); Esi = BCi ^((~BCo)& BCu ); Eso = BCo ^((~BCu)& BCa ); Esu = BCu ^((~BCa)& BCe ); // prepareTheta BCa = Eba^Ega^Eka^Ema^Esa; BCe = Ebe^Ege^Eke^Eme^Ese; BCi = Ebi^Egi^Eki^Emi^Esi; BCo = Ebo^Ego^Eko^Emo^Eso; BCu = Ebu^Egu^Eku^Emu^Esu; //thetaRhoPiChiIotaPrepareTheta(round+1, E, A) Da = BCu^rol(BCe, 1); De = BCa^rol(BCi, 1); Di = BCe^rol(BCo, 1); Do = BCi^rol(BCu, 1); Du = BCo^rol(BCa, 1); Eba ^= Da; BCa = Eba; Ege ^= De; BCe = rol(Ege, 44); Eki ^= Di; BCi = rol(Eki, 43); Emo ^= Do; BCo = rol(Emo, 21); Esu ^= Du; BCu = rol(Esu, 14); a[0] = BCa ^((~BCe)& BCi ); a[0] ^= (T)KeccakF_RoundConstants[round+1]; a[1] = BCe ^((~BCi)& BCo ); a[2] = BCi ^((~BCo)& BCu ); a[3] = BCo ^((~BCu)& BCa ); a[4] = BCu ^((~BCa)& BCe ); Ebo ^= Do; BCa = rol(Ebo, 28); Egu ^= Du; BCe = rol(Egu, 20); Eka ^= Da; BCi = rol(Eka, 3); Eme ^= De; BCo = rol(Eme, 45); Esi ^= Di; BCu = rol(Esi, 61); a[5] = BCa ^((~BCe)& BCi ); a[6] = BCe ^((~BCi)& BCo ); a[7] = BCi ^((~BCo)& BCu ); a[8] = BCo ^((~BCu)& BCa ); a[9] = BCu ^((~BCa)& BCe ); Ebe ^= De; BCa = rol(Ebe, 1); Egi ^= Di; BCe = rol(Egi, 6); Eko ^= Do; BCi = rol(Eko, 25); Emu ^= Du; BCo = rol(Emu, 8); Esa ^= Da; BCu = rol(Esa, 18); a[10] = BCa ^((~BCe)& BCi ); a[11] = BCe ^((~BCi)& BCo ); a[12] = BCi ^((~BCo)& BCu ); a[13] = BCo ^((~BCu)& BCa ); a[14] = BCu ^((~BCa)& BCe ); Ebu ^= Du; BCa = rol(Ebu, 27); Ega ^= Da; BCe = rol(Ega, 36); Eke ^= De; BCi = rol(Eke, 10); Emi ^= Di; BCo = rol(Emi, 15); Eso ^= Do; BCu = rol(Eso, 56); a[15] = BCa ^((~BCe)& BCi ); a[16] = BCe ^((~BCi)& BCo ); a[17] = BCi ^((~BCo)& BCu ); a[18] = BCo ^((~BCu)& BCa ); a[19] = BCu ^((~BCa)& BCe ); Ebi ^= Di; BCa = rol(Ebi, 62); Ego ^= Do; BCe = rol(Ego, 55); Eku ^= Du; BCi = rol(Eku, 39); Ema ^= Da; BCo = rol(Ema, 41); Ese ^= De; BCu = rol(Ese, 2); a[20] = BCa ^((~BCe)& BCi ); a[21] = BCe ^((~BCi)& BCo ); a[22] = BCi ^((~BCo)& BCu ); a[23] = BCo ^((~BCu)& BCa ); a[24] = BCu ^((~BCa)& BCe ); } ptr=0; } // Compute hash of each arg int main(int argc, char** argv) { Keccak<uint64_t, 1088> keccak; for (int i=1; i<argc; ++i) { FILE* in=fopen(argv[i], "rb"); if (!in) perror(argv[i]); else { keccak.init(); for (int c; (c=getc(in))!=EOF; keccak.put(c)); fclose(in); for (int j=0; j<32; ++j) printf("%02x", keccak.get()); printf(" %s\n", argv[i]); } } return 0; }
I've tested your code on some test vectors (http://keccak.noekeon.org/KeccakKAT-3.zip). Everything looks fine!
Just came here to post about it.
Got to add it to fsbench....
First results using a sse2 version:
Less than thrilling.Code:pcbsd-8973% ./fsbench -b131072 -t4 blake2bp blake2b skein512 bluemidnightwish512 edon-r512 blake512 ~/bench/scc.tar memcpy: 655 ms, 211927552 bytes = 3085 MB/s Codec version args C.Size (C.Ratio) C.Speed D.Speed C.Eff. D.Eff. Blake2bp 20121223 212031040 (x 1.000) 1052 MB/s 1053 MB/s 8733e12 8747e12 Blake2b 20121223 212031040 (x 1.000) 1145 MB/s 1141 MB/s 9510e12 9478e12 Skein512 SHA3 Final 64bit opt 212031040 (x 1.000) 1892 MB/s 1899 MB/s 15e15 15e15 BlueMidnightWish512 SHA3 rnd 1 64bit opt 212031040 (x 1.000) 1943 MB/s 1934 MB/s 15e15 15e15 Edon-R512 v20 212031040 (x 1.000) 2204 MB/s 2201 MB/s 17e15 17e15 Blake512 SHA3 Final 64bit opt 212031040 (x 1.000) 915 MB/s 912 MB/s 7601e12 7570e12 Codec version args C.Size (C.Ratio) C.Speed D.Speed C.Eff. D.Eff. done... (3x10 iteration(s)).
Phenom 955@3200.
EDIT: found out that it actually uses SSE.
EDIT2: Added BLAKE1 to the comparison.
And note: BLAKE2s / BLAKE2sp fail on my PC when used exactly like b/bs.
Last edited by m^2; 28th December 2012 at 00:48.
Hello!
I would not like an idea to include SHA3 in future ZPAQ releases. SHA3 even being optimized is very slow, about 6s/G for 64-bit platform and only 30s/G for 32-bit platform.
I would like to see blake2sp as follower of SHA1. Blake2s is 4-streamed by itself and fit well into SSE2 registers. Blake2sp is an 8-streamed version of blake2s and can utilize up to 8 cores of modern CPU. Although blake2sp it seems to be merely a tree hash, it has smart padding and smart flags to be as secure as blake2s. Blake2sp is already implemented in WinRAR5 and gives 0.5s/G upon i7. And it one believes has the strongest protection against collisions.
Note, to achieve max speed, pblendw is required, i.e. at least SSE4.1, otherwise 50% degradation will occur upon ordinary SSE2.
Last edited by persicum; 26th January 2014 at 16:11.
I have no plans to add SHA3 to ZPAQ. I use SHA1 for checksums and deduplication mainly for speed. I have already implemented SHA256 as part of Scrypt for key strengthening. So far, SHA1 is fast and good enough for what it is used for. If I learn that SHA1 collisions have been discovered then I will probably switch to something like the first 20 bytes of SHA256 for deduplication (replacing H blocks with G blocks).
Why not to use blake2s or even blake2sp? They will be 3-10 times faster than SHA256. Moreover, they have a spesial padding for any desirable output length. You can easily set this to 20 bytes if you want.
SHA1 will be prohibited in Microsoft since 2016. Although real collision examples are not yet found, the complexity drops to 2^52 AFAIK.
As far as collisions in SHA1 go, you may want to look at this: https://www.schneier.com/blog/archiv...ill_we_se.html
Also, 3-10 times faster sounds like an overstatement. Fast implementations of BLAKE2 require SIMD. A SIMD implementation of SHA256 will also be very fast(zpaq does not use SIMD for SHA256 AFAIK).
edit: I actually think that Jesse Walker's estimates are very conservative. Modern GPUs(especially AMD ones) can compute SHA1 at an extremely fast rate; much faster than any CPU. Optimized kernels for brute-force can be even faster.
Last edited by Mangix; 2nd February 2014 at 06:36.
So called sequential blakes are 4-streamed indeed. And so called 8-steamed blake2sp is 8*4 = 32-streamed indeed.
Why 32-steamed hash cannot be 10 times faster?
Note, this is not brute-force case of speedup by GPU, because here just ONE hash value is calculating in parallel.
Mangix, intel site contains papers about using SSE for sha1 - it only improves speed of multiple parallel sha1 computations. skylake intel cpu will support sha1/256 computation command, like it does for aes/crc
I think that almost all fears about collision for SHA1 is simply eccesive.
The currently "collision attack" is almost useless, because the "very" useful one will be a prefixed sequence, something like md5.
So, for my point of view, if you can find a collision in 1 second, but from "random" data, this is not a big deal at all.
The "affordable chosen prefix" attack, AFAIK for now, is not currently released to public for SHA1, in the sense that estimated cost (stimated) is about 2^16 greater then less stringent cases (equal prefix and so on).
2^16 is not a little difference (we are thinking about cost and time)
For now, at least.
---
With regards to blake2 (in multi CPU version), from my own test, for performaces POW, the computation time is effectly 1/2 or 1/3 (on i7-4 cores) than (highly) optimized md5.
But mass memory bandwidth is (currently) <<, even with RAID-0 SSDs, so total computer time is not CPU bound, but I/O bound.
In other words: a very good and fast algo, if properly implemented using custom m-instructions, but currently, and for AFAIK future, not so useful.
If, and when, this bottleneck reduce, then faster hash algos will dominate the playground.
Last edited by fcorbelli; 2nd February 2014 at 13:39.
Ahem... m$ is not exactly the first source of security
In fact m$ accept md5-signed certificates, in ALL versions of Windows, for...
Windows update.
And the Flame malware, faking windows update using a collision-made MD5 certificate,
have infected a lot of CHOSED system all around the world (this should be the very first cyber weapon identified) despite of antivirus that... trust in m$ signed code
(for m$)
Here's m$
http://blogs.technet.com/b/srd/archi...e-malware.aspx
---
As written, "chosed prefix collision" for SHA1 is stimated at 2^77, that is a big number indeed.
For now.
But there is "countertechnique", for detecting a fake prefixed certificate.
In other words if you use attack X, i can (hopefully...) detect the "strange" datas therefore discarding.
This, AFAIK, is the current "edge" of academic research.
I don't know Chinas, Iran, Russia and NSA works, of course.
Last edited by fcorbelli; 2nd February 2014 at 14:12.
AFAIK, Microsoft has thrown away all of their md5 certificates after FLAME.
@Bulat That doesn't change anything I said. CPUs are not as efficient as GPUs.
a driver can be certificated as a good software with SHA1. After that an attacker can patch random1 chunk with random2 chunk. The overall SHA1 will not change regardless of how were prefix and suffix. After that suffix can verify a couple of bytes of patched area and switch to harmful behavior.
High end AMD GPUs have ~2000 processors on board. Doing multiple-SHA1 operations in parallel is very fast. See: http://hashcat.net/oclhashcat/
A GPU will not be able to compute a single SHA1 value at a faster than CPU rate. Even if it could, the PCI Express latency will slow it down.
My original comment had mainly to do with brute-forcing SHA1. SHA1 is badly designed in the sense that for bruteforce, you can use large amounts of precomputation. See: https://t.co/y2TFBlm5BZ
Now I am not sure how applicable Stevens' attack is for GPUs as some other cryptographic attacks(eg. DES or NFS on RSA) require large amounts of memory(this can defeat GPUs). However if you are to do many hash operations in parallel, nothing beats GPUs yet.
Yes, I hope so.
As the state.
But after, not before
This is true, but with a precisation.@Bulat That doesn't change anything I said. CPUs are not as efficient as GPUs.
A linear augment in elaborate units does not help for an exponential search space to be exhausted.
So, if you have say a 10x, or 100x, or even 1.000x in search time, this should be not much better.
For strenght algos, of course.
AFAIK this is not currently possibile, because we have not seen (yet) a SHA1 collision, and no rogue certificate, as for md5 back in (by my own memory) 2008.
You statement will be true, in the future (when? I dont think too soon), but for now it is not.
This is only my own opinion, of course.
GPUs have a good PR managers. those "cores" are actually just ALUs. unfortunately, many algos like SHA1 cannot work faster on multiple ALUs so unless you have dozens of datastreams to hash, GPU will be slower than CPU
latency? you probably means throughput - it's about 8GB/s, i.e. much faster than SHA1 computation speedthe PCI Express latency