dupmerge.c

This is a utility that scans a UNIX directory tree looking for pairs of distinct files with identical content. When it finds such files, it deletes one file to reclaim its disk space and then recreates its path name as a link to the other copy.

My first version of this program circa 1993 worked by computing MD5 hashes of every file, sorting the hashes and then looking for duplicates. This worked, but it was unnecessarily slow.

My second version circa 1999 unlinked the duplicates as a side effect of the sort comparison function.

I have since rewritten it again from scratch. It now produces and completely sorts the list of files by size before running through them looking for duplicates. This version also has some new options. Until I can update the manual page, read about them in the source file. Be wary of the -d (delete) flag; it is not fully thought out and tested. The -0 and -f flags seem to work well.

dupmerge.c, current version

Last updated: 14 Mar 2009