Results 1 to 7 of 7

Thread: Directory hash as one string

  1. #1
    Member FatBit's Avatar
    Join Date
    Jan 2012
    Location
    Prague, CZ
    Posts
    189
    Thanks
    0
    Thanked 36 Times in 27 Posts

    Directory hash as one string

    Dear forum memebers,

    what do you think about directory hash as a content (sub)directory tar (or something like this, without real processing) + hash? According to my opinion the advantage is only one string as a change flag. Next processing step can continue with "ordinary" file hash function.

    Sincerely yours,

    FatBit

  2. #2
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Could you say some more? I have no idea what are you talking about.

  3. #3
    Member FatBit's Avatar
    Join Date
    Jan 2012
    Location
    Prague, CZ
    Posts
    189
    Thanks
    0
    Thanked 36 Times in 27 Posts
    I apologise for my poor explanation. E. g. when Linux distributes its files, they are collected and compressed into one file (e. g. .iso) and check hash MD5 value is calculated. And my idea is following: to compute directory check hash value "withouth" collection step. Internally it is probably necessary to make collection step as somethink like tar, but externally I compute "directory hash" for one directory hash string, not many hashes for all files.

    Sincerely yours,

    FatBit

  4. #4
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,474
    Thanks
    26
    Thanked 121 Times in 95 Posts
    Probably you can do that using pipes ie something like:
    tar -c *.* | md5sum

  5. #5
    Member FatBit's Avatar
    Join Date
    Jan 2012
    Location
    Prague, CZ
    Posts
    189
    Thanks
    0
    Thanked 36 Times in 27 Posts
    Yes, I agree. But I think that it is at least drive space consuming and unnecessary step. It would be better "to integrate" tar -c *.* | md5sum step into file hash tool as a temporary calculation step. Or not to use tar step and during directory tree reading to calculate hash value (not only MD5!) "on the fly".

    Sincerely yours,

    FatBit

  6. #6
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,474
    Thanks
    26
    Thanked 121 Times in 95 Posts
    It should not use any temporary space, besides small in-RAM buffer.
    cat /dev/urandom | md5sum doesn't consume any disk space either.
    The drawback is that above command produces only md5 hash, one would need to write a wrapper that eats data from stdin and outputs that data to multiple hashers, eg md5sum, sha256sum, sha1sum, etc But that should be easy.

  7. #7
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    I wonder whether there's a filesystem with such capability. Tahoe-LAFS and ZFS both have quite similar features, but in case of ZFS it's not quite the same and in case of LAFS - not sure, but probably not the same either. With Merkle trees and log-structured FS it shouldn't be very expensive.

Similar Threads

  1. can zip directory entries share file entries?
    By willvarfar in forum Data Compression
    Replies: 1
    Last Post: 23rd February 2011, 15:46
  2. Fastest non-secure hash function!?
    By Sanmayce in forum Data Compression
    Replies: 13
    Last Post: 20th November 2010, 20:54
  3. Directory scanning in windows
    By Shelwien in forum The Off-Topic Lounge
    Replies: 2
    Last Post: 26th November 2009, 14:49
  4. Hash Zip
    By Black_Fox1 in forum Forum Archive
    Replies: 6
    Last Post: 4th March 2007, 18:12

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •