CHK v1.10 has been released! Please enjoy new release!
http://encode.narod.ru/
![]()
CHK v1.10 has been released! Please enjoy new release!
http://encode.narod.ru/
![]()
Timing tests for enwik9 (1 GB) from cache on a 2.0 GHz T3200, 2 cores, 3 GB, Win32. Nice improvement from v1.03.
crc16 6.5 sec
crc32 6.5 sec
crc64 10.5 sec
md4 8.4-8.5 sec
md5 13.8-13.9 sec
sha1 15.3 sec
sha256 21.4 sec
sha512 48.9 sec
sha3 93.0 sec
Times are as reported by the program rounded to 0.1 seconds. I ran each test twice and reported both times if different. To make sure that enwik9 was all in disk cache I watched the disk light (off) and CPU usage in task manager (50%). I ended up having to run a program that allocates all memory until it runs out and exits in order to force other programs to page out enough memory to keep all of enwik9 in cache.
Matt, you can install RAM drive software and increase RAM disk size before running test from it. at least one from SuperSpeed LLC can do it witout rebooting
Ilya, 7-zip includes OSS crc32 asm code that runs about 2GB/sec on 2600k@4.6
Also note times for SlavaSoft fsum 2.51 on same hardware:
crc32 5.1 sec
md4 4.0 sec
md5 5.2 sec
sha1 5.6 sec
sha256 13.3 sec
sha512 141.6 sec![]()
Compiled CHK's code using Visual C++ 2012.
Hardware:
CPU: Intel Core i7-3770K @ 4.7 GHz
RAM: 16 GB Corsair Dominator Platinum @ ~1900 MHz
SSD: 240 GB Corsair Force GT
SHA1, ENWIK9
CHK -> 4.4 sec
sha1sum -> 3.3 sec
CHK Command-line -> 3.1 sec
fsum-> 1.9 sec
Dunno how they did that!I think fsum uses SSE/AVX/GPU-based computation or very smart ASM optimizations...
I wonder too. fsum is twice as fast as what I could write.
Small improvement in libzpaq SHA1 enwik9, from 11.4 sec to 9.5 sec when compiled with g++ -O3 -msse2, or 9.9 sec with cl /O2 /arch:SSE2
This will go in the next version of libzpaq if I can't find any further improvements. The main improvement is in reducing w[80] to w[16] and scheduling as needed, and replacing the f expressions to use fewer operations as suggested in Wikipedia. About half the speedup is due to replacing getc() with fread(), so that won't have an impact on zpaq.Code:// sha1.cpp - compute SHA1 hashes of filename arguments // Written by Matt Mahoney. Public domain. #define _CRT_DISABLE_PERFCRIT_LOCKS #include <stdio.h> #include <string.h> #include <stdint.h> typedef uint32_t U32; // For computing SHA-1 checksums class SHA1 { public: void put(int c) { // hash 1 byte U32& r=w[len0>>5&15]; r=(r<<8)|(unsigned char)c; if (!(len0+=8)) ++len1; if ((len0&511)==0) process(); } double size() const {return len0/8+len1*536870912.0;} // size in bytes const char* result(); // get hash and reset SHA1() {init();} private: void init(); // reset, but don't clear hbuf U32 len0, len1; // length in bits (low, high) U32 h[5]; // hash state U32 w[16]; // input buffer char hbuf[20]; // result void process(); // hash 1 block }; // Start a new hash void SHA1::init() { len0=len1=0; h[0]=0x67452301; h[1]=0xEFCDAB89; h[2]=0x98BADCFE; h[3]=0x10325476; h[4]=0xC3D2E1F0; memset(w, 0, sizeof(w)); } // Return old result and start a new hash const char* SHA1::result() { // pad and append length const U32 s1=len1, s0=len0; put(0x80); while ((len0&511)!=448) put(0); put(s1>>24); put(s1>>16); put(s1>>8); put(s1); put(s0>>24); put(s0>>16); put(s0>>8); put(s0); // copy h to hbuf for (int i=0; i<5; ++i) { hbuf[4*i]=h[i]>>24; hbuf[4*i+1]=h[i]>>16; hbuf[4*i+2]=h[i]>>8; hbuf[4*i+3]=h[i]; } // return hash prior to clearing state init(); return hbuf; } // Hash 1 block of 64 bytes void SHA1::process() { U32 a=h[0], b=h[1], c=h[2], d=h[3], e=h[4]; static const U32 k[4]={0x5A827999, 0x6ED9EBA1, 0x8F1BBCDC, 0xCA62C1D6}; #define f(a,b,c,d,e,i) \ if (i>=16) \ w[(i)&15]^=w[(i-3)&15]^w[(i-8)&15]^w[(i-14)&15], \ w[(i)&15]=w[(i)&15]<<1|w[(i)&15]>>31; \ e+=(a<<5|a>>27)+k[(i)/20]+w[(i)&15] \ +((i)%40>=20 ? b^c^d : i>=40 ? (b&c)|(d&(b|c)) : d^(b&(c^d))); \ b=b<<30|b>>2; #define r(i) f(a,b,c,d,e,i) f(e,a,b,c,d,i+1) f(d,e,a,b,c,i+2) \ f(c,d,e,a,b,i+3) f(b,c,d,e,a,i+4) r(0) r(5) r(10) r(15) r(20) r(25) r(30) r(35) r(40) r(45) r(50) r(55) r(60) r(65) r(70) r(75) #undef f #undef r h[0]+=a; h[1]+=b; h[2]+=c; h[3]+=d; h[4]+=e; } int main(int argc, char** argv) { SHA1 sha1; for (int i=1; i<argc; ++i) { FILE* in=fopen(argv[i], "rb"); if (!in) perror(argv[i]); else { const int BUFSIZE=4096; int n; unsigned char buf[BUFSIZE]; while ((n=fread(buf, 1, BUFSIZE, in))>0) { for (int i=0; i<n; ++i) sha1.put(buf[i]); } fclose(in); double sz=sha1.size(); const char* p=sha1.result(); for (int j=0; j<20; ++j) printf("%02x", p[j]&255); printf(" %1.0f %s\n", sz, argv[i]); } } return 0; }
I suspect that further gains would come from using SSE2 for scheduling, unrolled to w[32]. The main round function has to be done sequentially.
New version will be released soon!
What's new:
+ Added ED2K hash support
+ UTF-8 output of hashes (Checksums.txt), changed output format to "hash *file"
+ Changed "Copy to clipboard" format to, "file, hashtype: hash"
+ CHK will always highlight all equal hashes, not only when you "Sort By Hash"
+ Some small GUI fixes - more correct DPI scaling, corrected appearance under Windows XP
Will ED2K be multithreaded?
Not this time - just currently I'm focused on more basic things...
As a note, CHK's output is 100% compatible with such hash tools as RHASH, including UTF-8 encoding.
Just save hash list as TXT file and run "rhash -c checksums.txt"
RHASH will correctly detect the hash type: CRC-32/ED2K/MD4/MD5/SHA-1/SHA-256/SHA-512
Added the Uppercase option, removed mostly useless List View mode
Just tested 64-bit compile of CHK. First of all, 64-bit executable is really fat - 11.7 MB vs 3.3 MB of 32-bit compile. And looks like UPX can't pack 64-bit executables.
Anyway, timings are (ENWIK9 again):
No code has to be inserted here.
![]()
7-zip 32-bit crc32 algo does it in 0.5 seconds (2600k@4.6)
Replaced an old putz input box with a proper hash-input dialog:
During long CHK v1.12 evaluation and testing, found a bug in RHASH's SHA-512 implementation...
As you can see, this time I decided to do extra testing. I can tell ya, this new version is just awesome - really easy to use with improved readability and user interface - I just really enjoy testing it...
The last SHA3 entry in your enwik9 performance table is interesting. Which implementation are you using ?
SSE optimized Keccak 256 (part of the reference implementation) is 4x faster than Skein 256 on my laptop: Core i5 430M 2.27 GHz. It processes at 1 GB/s while Skein manages 243 MB/s. Of course the Skein implentation is optimized x64 assembly but does not use SSE.
For both cases I am using (in Pcompress) the optimized reference implementations which were part of the NIST submissions. Also Intel has a SSE/AVX optimized ASM version of the core SHA 256 block function here: http://download.intel.com/embedded/processor/whitepaper/327457.pdf
I am using this in Pcompress as well. It really shines on an AVX enabled processor.
Last edited by moinakg; 29th December 2012 at 22:10.
My previous claim wrt to Keccak performance is wrong. I was still testing it with the test vectors and found an error. So it is now more realistic and as per the table above.
And order-0 histogram for ENWIK9 - an idea for the upcoming CHK's feature. Additionally, I have an idea about an order-1 histogram/graph/map.
Looks good, waiting to see order-1 histogram, and maybe also logarithmic scale switch.