I'm not sure that this is the right place to put this, but I made a fast duplicate file finder and wanted to share it. It takes options from a configuration file and writes a report of all duplicates found. The different configuration commands are as follows:
Code:
verbose={0,1,2} : Sets the level of console verbosity
recurse={true,false} : Sets the option to recurse into subdirectories
report={filename} : Sets the output filename for the report
filter={filter} : Sets the filter for finding files
alldirs={true,false} : Sets the option to search all directories, not just the ones that match the search filter
AddPath {path} : Add a path to search for duplicates. Multiple paths may be specified in the same configuration file
The report format should be fairly simple. Finding duplicate files occurs in several stages. First, all files from the given paths are enumerated. They are then sorted by size and divided into groups of the same size. Groups with only one item are discarded. Then, all files over 256kb have a 16 byte sample taken from the middle and are sorted and divided similarly to how they were divided by size. Afterwards, all remaining files are hashed using SHA-1 and the hashes are sorted to find duplicates.
The program is written in VB.net, so the .NET framework is required. File enumeration occurs through native API instead of using the framework classes to increase speed. If there are any errors, please let me know.