cmpfiles
Compares the set of files in DIR1 and DIR2 (by hashing) while ignoring the directory structure.
I have a lot of old files and weird old copies that sometimes have similar files but in slightly different file tree structures. As I wanted to remove the duplicate files I wanted to make sure that the files actually were duplicates and were safe to delete.
$ tree
.
├── dir1
│ ├── a.txt
│ ├── b.txt
│ └── c.txt
└── dir2
├── dir3
│ └── c.txt
└── dir4
├── a.txt
└── b.txt
5 directories, 6 filesThe above example shows two different directories with different file tree
structures. Comparing manually whether it is safe to delete dir2 (i.e if
all files in dir2 is in dir1) can be time consuming. cmpfiles recursively
traverses the file tree, hashes all files, and then compares whether the tree
produced the same hashes by sorting the list of hashes.
$ cmpfiles dir1/ dir2/
'dir1/c.txt' does not exist in 'dir2'
'dir2/dir3/c.txt' does not exist in 'dir1'
'dir1/a.txt' <=> 'dir2/dir4/a.txt'
'dir1/b.txt' <=> 'dir2/dir4/b.txt'In this example cmpfiles tells us that dir1/c.txt and dir2/dir3/c.txt are
not found in the other directory (because they have different hashes) and we
know that we should not remove these files before investigating this
discrepancy. One file is probably an older version of the other file.
Help
Usage: cmpfiles [OPTION]... DIR1 DIR2
Compares the set files in DIR1 to the files in DIR2.
Compares the files (but not the file structure) by hashing each
file in DIR1 and in DIR2, and then comparing them. It does not
dereference symbolic links.
With no options, it produces output in three sections. The first
section contains the files that are in DIR1 and not in DIR2. The
second section contains the files that are in DIR2 and not in
DIR1. The third section contains the files that are in both DIR1
and DIR2.
If the program encounters duplicate files inside DIR1 (or inside
DIR2) it ignores these and outputs warnings to stderr.
-1 Suppress first section (files in DIR1 but not in DIR2)
-2 Suppress second section (files in DIR2 but not in DIR1)
-3 Suppress third section (files in both DIR1 and DIR2)
--md5 Use MD5 as hash function instead of SHA256
-h, --help Display this help and exit
EXAMPLES
# Print only files in both dir1 and dir2.
$ cmpfiles -12 dir1 dir2
# Print files in dir1 and not in dir2, and vice versa.
$ cmpfiles -3 dir1 dir2
# Print files in dir1 that is not in dir2.
$ cmpfiles -23 dir1 dir2
Somewhat inspired by comm(1).Download
Download the repository using
git clone https://mvidell.se/repos/cmpfiles.git