This script is meant to be used interactively (I use it in IPython console). It checks for duplicates inside a given path using md5sum hash.
It has the following main functions.
find_dup(path = '.', opt = 0)
Find duplicate files in a given path (or files in the directory tree rooted at this path) by md5sum hash.
Options: If opt = 0 (default) it checks files immediately in the path (and not its subdirectories). If
opt = 1 it checks all files in the directory tree rooted in the path.
Returns: Two dictionaries (md5_dict, dup_count) where md5_dict is a dictionary with md5sum as keys and
file name(s) as values (if opt = 0), or file path(s) as values (if opt = 1); and dup_count has md5sum
(of duplicate files) as keys and the number of redundant duplicates as values. So if a file occurs twice,
then the duplicate count is 1.
dup_summary(md5_dict, dup_count)
Print a summary of duplicate files, using the outputs from the find_dup() function.
dup_report(md5_dict, dup_count)
Print a detailed report of duplicate files, using the outputs from the find_dup() function.