Page 2 of 3 FirstFirst 123 LastLast
Results 31 to 60 of 65

Thread: CHK wishlist

  1. #31
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,009
    Thanks
    399
    Thanked 397 Times in 152 Posts
    24x24 version. Probably it's too big, anyway, check it out!
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	24x24.png 
Views:	247 
Size:	153.9 KB 
ID:	1830   Click image for larger version. 

Name:	24x24list.png 
Views:	256 
Size:	145.6 KB 
ID:	1831  

  2. #32
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,009
    Thanks
    399
    Thanked 397 Times in 152 Posts

    Cool

    Seriously optimized MD5. Note that MD5 will be the default hash, because:

    • MD5 is the most popular hash
    • MD5 is FAST (Faster than SHA-1)
    • MD5 hash is shorter than SHA-1 thus it is more readable


    Some timings on ENWIK9 (including I/O):

    Indy10 -> 18 sec (Shame!)
    RFC implementation -> 7 sec
    My MD5 -> 4 sec

    If Indy10 component IdHashMessageDigest worked at descent speed (at least 1.5x slower than reference implementation) I will never consider writing own MD5 (SHA-1,SHA-256) implementation. But since it is nearly 5x times slower - I can't afford such price... So I worked seriously to hand-optimize C++ code and got something interesting. Now CHK is MUCH faster than many hash tools!

  3. #33
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,552
    Thanks
    767
    Thanked 685 Times in 371 Posts
    afair, sha1 in srep runs at 300mb/s and md5 is 500mb/s. it's a code from the LibTomCrypt library, w/o any modifications

  4. #34
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,009
    Thanks
    399
    Thanked 397 Times in 152 Posts
    Some timings on ENWIK9 (including I/O)

  5. #35
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,009
    Thanks
    399
    Thanked 397 Times in 152 Posts
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	chk103.png 
Views:	249 
Size:	47.8 KB 
ID:	1840  

  6. #36
    Tester
    Black_Fox's Avatar
    Join Date
    May 2008
    Location
    [CZE] Czechia
    Posts
    471
    Thanks
    26
    Thanked 9 Times in 8 Posts
    Where'd the timings go?
    I am... Black_Fox... my discontinued benchmark
    "No one involved in computers would ever say that a certain amount of memory is enough for all time? I keep bumping into that silly quotation attributed to me that says 640K of memory is enough. There's never a citation; the quotation just floats like a rumor, repeated again and again." -- Bill Gates

  7. #37
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 795 Times in 488 Posts

  8. #38
    Tester
    Black_Fox's Avatar
    Join Date
    May 2008
    Location
    [CZE] Czechia
    Posts
    471
    Thanks
    26
    Thanked 9 Times in 8 Posts
    It is, but still popular and probably would stop some from using the program. Older post from this thread:
    Quote Originally Posted by encode View Post
    * Most likely I should add MD5 computation - it's too popular to ignore. SHA2 is not that popular and slower - thus can be ignored, I guess. As soon as SHA3 will be approved (March 2012 or later, candidates are known already), I'll add it! Probably I should add a hash selection (CRC or MD5 or SHA1 - for speed), or, all hashes at the same time in different columns.
    I am... Black_Fox... my discontinued benchmark
    "No one involved in computers would ever say that a certain amount of memory is enough for all time? I keep bumping into that silly quotation attributed to me that says 640K of memory is enough. There's never a citation; the quotation just floats like a rumor, repeated again and again." -- Bill Gates

  9. #39
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,552
    Thanks
    767
    Thanked 685 Times in 371 Posts
    Quote Originally Posted by encode View Post
    Some timings on ENWIK9 (including I/O)
    3 gb/s

  10. #40
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,009
    Thanks
    399
    Thanked 397 Times in 152 Posts
    MD5 is broken, I know. I even posted a ZIP file containing two different files sharing the same MD5! But browsing the web I see that MD5 is the most common hash.

    About speed. I can't claim that I have fastest MD5, since I'm using Borland C++ Builder - pretty slow compiler, and it's a GUI program, but... I squeezed all from this compiler. I tested bunch of GUI MD5 tools and CHK is the fastest! I'm sure that if compile my code for single-threaded command line tool the code will be slightly faster.

    What I've done. I just collected all optimization ideas for MD5 from the most fastest MD5 implementations - OpenSSL, HashCat, MD5 papers... and tested them. Some of them worked some of them not. Simple.

    A few examples:

    Use "union" instead of translating array of bytes to array of integers.
    Code:
    union
    {
      Byte Buf[64];
      UInt X[16];
    };
    Cheapest trick, but many MD5 implementations still load array of ints using many operations like:
    Code:
      X[i]=Buf[j]|(Buf[j+1]<<8)|(Buf[j+2]<<16)|(Buf[j+3]<<24);
    Yes it's needed for Big-Endian compatibility, but in most cases this is not needed I guess. Even Macs are Little-Endian now (same Intel CPU as with PC)

    Some things are not worked as they should. Many implementation uses "optimized" function form Wei Dai:
    Code:
      (((c^d)&b)^d) // Instead of ((b&c)|(~b&d))
    Original code runs faster somehow.

    I like the Round 3 optimization idea from the author of HashCat, that not worked for me though:
    Code:
      a+=(b^c^d)+X[5]+0xfffa3942UL;
      a=Rol(a, 4)+b;
      d+=(a^b^c)+X[8]+0x8771f681UL;
      d=Rol(d, 11)+a;
    
      // These b^c (and later d^a) can be precomputed:
    
      UInt t=b^c; 
      a+=(t^d)+X[5]+0xfffa3942UL;
      a=Rol(a, 4)+b;
      d+=(a^t)+X[8]+0x8771f681UL;
      d=Rol(d, 11)+a;
      t=d^a;
      c+=(t^b)+X[11]+0x6d9d6122UL;
      c=Rol(c, 16)+d;
      b+=(c^t)+X[14]+0xfde5380cUL;
      b=Rol(b, 23)+c;
    
      // and so on

  11. #41
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,552
    Thanks
    767
    Thanked 685 Times in 371 Posts
    and srep compiled by 64-bit ICL

  12. #42
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,009
    Thanks
    399
    Thanked 397 Times in 152 Posts
    Added CRC16. Because. of LHA/LZH uses it.

    Now you can copy hash. of multiple files. The result is:

    Automatic Backup[10].rbk, MD5: 6084AF67AC459E171654F2C5B3C9296B
    Automatic Backup[11].rbk, MD5: 2DD4750866DA3666A386BFD0BA511FF3
    PM-RegScan.bmp, MD5: C552AAE37D439096077085E223F5B186
    RMScrn.exe, MD5: B3A026B8D5DFBA292187576C95B511AF
    XRegistry.bin, MD5: 77D7200CC17366DF5A04248CF0FF2C4C
    nu.exe, MD5: 0769E2260F1F29CA92872EFA59000EB2

    etc.

    New Refresh command updates hashes of all files.
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	crc16.png 
Views:	239 
Size:	149.7 KB 
ID:	1842   Click image for larger version. 

Name:	copyhash.png 
Views:	239 
Size:	173.1 KB 
ID:	1843   Click image for larger version. 

Name:	refresh.png 
Views:	248 
Size:	133.6 KB 
ID:	1844  

  13. #43
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 795 Times in 488 Posts
    Quote Originally Posted by encode View Post
    Some things are not worked as they should. Many implementation uses "optimized" function form Wei Dai:
    Code:
      (((c^d)&b)^d) // Instead of ((b&c)|(~b&d))
    Original code runs faster somehow.
    Because original code is parallel and "optimization" has sequential dependencies.

  14. #44
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,009
    Thanks
    399
    Thanked 397 Times in 152 Posts
    Checked "Magnet Links" conception. Well, it's yet another field where CHK can be used. So, I will add MD4 since it's not really dead - eD2K network (eMule and others) use it. Thus you will able to create eD2K Magnet Links for files that equal or less than 9500 KB in size. For larger files (more than one chunk) the ED2K hash is an MD4 of hash list.
    Another idea is to add hash list feature. For each file, instead of generating just one hash, we will generate a hash list - list of hashes for each file chunk/part. Say as with ED2K we well generate an MD4 for each 9500 KB chunk and output these hashes as: hash1:hash2:hash3:hash4 and so on. So you will able to detect files with the same begining or detect what part was downloaded incorrectly, having the complete hash list.
    Checked the KaZaa hash - worst idea - MD5 with data skip. We hash 300 KB of data, then skip 300 KB, next, we hash 600 KB and again skip same 600 KB, next, read 1200 KB, skip 1200 KB and so on. The hacker can modify huge part of a file being transparent - the hash check will unable to detect file changes...
    Yet another adea is Base32 encoding for hashes. SHA1-Base32 is frequently used in P2P networks - being extremely readable. The SHA1-Base32 length is equal to MD5 (Hex)!

  15. #45
    Tester
    Black_Fox's Avatar
    Join Date
    May 2008
    Location
    [CZE] Czechia
    Posts
    471
    Thanks
    26
    Thanked 9 Times in 8 Posts
    There is a prospect of Pirate Bay tracker using magnet links, will those be the same ED2K hash or something else?
    I am... Black_Fox... my discontinued benchmark
    "No one involved in computers would ever say that a certain amount of memory is enough for all time? I keep bumping into that silly quotation attributed to me that says 640K of memory is enough. There's never a citation; the quotation just floats like a rumor, repeated again and again." -- Bill Gates

  16. #46
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,488
    Thanks
    26
    Thanked 130 Times in 100 Posts
    Wikipedia states the following about MD4: "generating a collision is now as cheap as verifying it (a few microseconds)".

    Magnet links are very flexible. TPB can use any hash algorithm to generate hashes, but I hope they'll not opt for something already broken.

  17. #47
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 795 Times in 488 Posts
    Yes, in the original paper from China that simultaneously broke MD4, MD5, TIGER, and RIPEMD-128, they said that finding collisions for MD4 could be done by "hand calculation". But I suppose CRC-32 has its uses too.

    AFAIK, there are no known secure 128 bit hashes. But I suppose you could throw away half of the output of SHA-256 and be safe.

  18. #48
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,009
    Thanks
    399
    Thanked 397 Times in 152 Posts
    The development goes, but slooowly, due to my extreme busyness at my main job.

    Anyway, I added MD4. Thinking about SHA1-base32 - nice secure and not too long&readable hash. Probably this one must be the default - too many users complain about MD5 weaknesses. But I guess, SHA1-base32 must be called in more compact way, say, G2 (gnutella 2 uses it) or Magnet, or even CHK.

    Thinking about new program name. HashFrog? Or some name that will be unique and hash-related. Anyway, CHK is okay too, I guess.

    I really like the thing that people visit my homepage even without news from my side! Even if I'm far from home (I've just returned from Siberia super tour) I can find motivation to improve or create things... Thank you!

  19. #49
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,009
    Thanks
    399
    Thanked 397 Times in 152 Posts

    Cool

    Added ED2K hash.
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	ed2k.png 
Views:	245 
Size:	97.8 KB 
ID:	1863  

  20. #50
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,552
    Thanks
    767
    Thanked 685 Times in 371 Posts
    CryptHashData() from win32 api seems to be fastest: 636 mb/s for md5 and 474 mb/s for sha1 on 2600k@4.6GHz
    Attached Files Attached Files

  21. #51
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,009
    Thanks
    399
    Thanked 397 Times in 152 Posts
    Thanks for the link, anyway.

    Added "Save as..." command to save/dump the file list to TXT file. The layout is subject to change, however it looks like:

    // Generated by CHK v1.03 (Optional)
    // 05.03.2012 21:15 or 03/05/2012

    // Generated by CHK v1.03 on 03/05/2012

    // Generated on Sunday March 4th, 2012 at 11:36:43

    C:\test.txt, MD5: xxxxxxxxxxxxxxxx

    Probably I should dump filesize as well:

    C:\test.txt (3,735 bytes), MD5: xxxxxxxxxxxxxxxx

    Timestamp is useful here. Not sure about program and its version, but it can be useful too.

  22. #52
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Maybe tab (\t) can be more useful as a separator between filename, hash and file size. Because, a script can easily process such dump files.
    BIT Archiver homepage: www.osmanturan.com

  23. #53
    Member Karhunen's Avatar
    Join Date
    Dec 2011
    Location
    USA
    Posts
    91
    Thanks
    2
    Thanked 1 Time in 1 Post
    Can you describe what the binaries do, do they add a hash to the data buffer? Is there a better example for newbies for a console stdin to stdout i.e. "flush" the buffer to stdout ?

  24. #54
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,009
    Thanks
    399
    Thanked 397 Times in 152 Posts
    Not really understood the question...

  25. #55
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,009
    Thanks
    399
    Thanked 397 Times in 152 Posts
    Now CHK dumps hash list in Unicode. Please check the attached file. If you will have any issues with this file - let me know!
    Attached Files Attached Files

  26. #56
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    It's a bit hard to parse such notation with regular expressions. Because, parenthesis can be used in file names too. Why don't you use \t or similar instead of some cosmetic stuffs which makes harder to parse?
    BIT Archiver homepage: www.osmanturan.com

  27. #57
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,488
    Thanks
    26
    Thanked 130 Times in 100 Posts
    sha*sum, md5sum and probably other *sum programs, if they exist, uses format like:
    <hash> <filename>

    eg

    Code:
    piotrek@p5q-pro:~/Pobrane/corpora/enwik$ sha1sum *
    88e00330c706b76aac18e7c71e36b06578943d60  0enwik8
    d31fb012d587941d7d0c76eece89967455490be9  0enwik8.xwrt
    57b8363b814821dc9d47aa4d41f58733519076b2  enwik8
    57b8363b814821dc9d47aa4d41f58733519076b2  enwik8.dec
    0c65b4c7314408d0cefb6d28f62bdb5adf49dcbe  enwik8.lzp
    ff04a4d8231bd89a59893e50cf686155cee19ed8  enwik8.lzpccm
    7fff4c0cd40db0b0e6974e571f43f11b6d46aa7a  enwik8.lzpccm2
    46e10cd26cbcae8dfbe741cb0995e1efebfb2216  enwik8.sel
    fc4ca3271ee798c7b4c17cf4a93f594eae5334dd  enwik8.swi
    27fe85921f14de8a959bcc23b0da0e68a8726ab2  enwik8.xwrt
    2996e86fb978f93cca8f566cc56998923e7fe581  enwik9
    2996e86fb978f93cca8f566cc56998923e7fe581  enwik9.dec
    54191e1331c75ef0edeb50841e43dda96d8b60c5  enwik9.lzp
    1e43f981aa7e355dec89653a8a0a50696573a46b  enwik9.lzpccm
    d6ee95bc29c8be4cbe606f696f4018a6f825b1b6  table
    19790d9fa9bcc2a207126caa3da8f27ea2cf648b  _tables.out
    sha1sum: xwrt: Jest katalogiem
    piotrek@p5q-pro:~/Pobrane/corpora/enwik$ sha2
    I've checked sha1sum and it silently ignores any invalid line when checking, only reporting that all lines are invalid if that is the case.

  28. #58
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,009
    Thanks
    399
    Thanked 397 Times in 152 Posts
    <Hash> <FileName>
    or
    <Hash> *<FileName>
    has one disadvantage - I can't see the hash type from such listing!

    Anyway, I have an idea - add comment with the hash type as a header:
    Code:
    # SHA1
    88e00330c706b76aac18e7c71e36b06578943d60  0enwik8
    d31fb012d587941d7d0c76eece89967455490be9  0enwik8.xwrt
    57b8363b814821dc9d47aa4d41f58733519076b2  enwik8
    57b8363b814821dc9d47aa4d41f58733519076b2  enwik8.dec
    0c65b4c7314408d0cefb6d28f62bdb5adf49dcbe  enwik8.lzp
    ff04a4d8231bd89a59893e50cf686155cee19ed8  enwik8.lzpccm
    7fff4c0cd40db0b0e6974e571f43f11b6d46aa7a  enwik8.lzpccm2
    46e10cd26cbcae8dfbe741cb0995e1efebfb2216  enwik8.sel
    fc4ca3271ee798c7b4c17cf4a93f594eae5334dd  enwik8.swi
    27fe85921f14de8a959bcc23b0da0e68a8726ab2  enwik8.xwrt
    2996e86fb978f93cca8f566cc56998923e7fe581  enwik9
    2996e86fb978f93cca8f566cc56998923e7fe581  enwik9.dec
    54191e1331c75ef0edeb50841e43dda96d8b60c5  enwik9.lzp
    1e43f981aa7e355dec89653a8a0a50696573a46b  enwik9.lzpccm
    d6ee95bc29c8be4cbe606f696f4018a6f825b1b6  table
    19790d9fa9bcc2a207126caa3da8f27ea2cf648b  _tables.out
    I guess I will add CRC64 - since 7-Zip already use it. Probably I shouldn't keep ED2K - since it's just MD4 basically (for files that less or equal to 9500 KB in size, for larger files it's MD4 of all MD4 sums of each 9500 KB block)

  29. #59
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,009
    Thanks
    399
    Thanked 397 Times in 152 Posts

    Smile

    Probably I shouldn't include CRC16 - it produces too many collisions. The only reason to include it is that LZH/LHA use CRC16 for file integrity checking.

    Tested CRC64 - really like it! It's probably the future standard. Altough using 64-bit arithmetic is somewhat slow on 32-bit machine/code. Will rewrite it using 32-bit arithmetic only. With CRC it is easily possible.

  30. #60
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Did you consider to implement hardware accelerated CRC32? IIRC, your CPU has already support it. But, it's output slightly different from well-known variant (it uses 0x1EDC6F41 polynomial instead of 0x04C11DB7). I have to note that it's really fast.
    BIT Archiver homepage: www.osmanturan.com

Page 2 of 3 FirstFirst 123 LastLast

Similar Threads

  1. CHK 1.02 - file analysis tool
    By encode in forum Data Compression
    Replies: 6
    Last Post: 24th July 2011, 15:46
  2. CHK 1.01 is here! (New GUI MD5/SHA1 file checker)
    By encode in forum Data Compression
    Replies: 24
    Last Post: 20th July 2011, 08:45

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •