Is there a program to find duplicate files across multiple hard drives? Basically, I have shared folder that too many people have had access to over the years and I would like to see if there are duplicate files saved at multiple places in the folder under different file names. I am using Windows XP Pro/MCE 2005.

apkellogg wrote:Is there a program to find duplicate files across multiple hard drives? Basically, I have shared folder that too many people have had access to over the years and I would like to see if there are duplicate files saved at multiple places in the folder under different file names. I am using Windows XP Pro/MCE 2005.

Thanks you for any advice.

If they have different names, then they are different files.

You would probably need a program to search by size, but I am just guessing at this point.

(If your browser has wrapped the above code, note that the only line break is after the "#!/bin/bash"; the rest is all one long line.)

Just invoke the script, passing the names of one or more drives or directories on the command line. The script searches all of the listed drives/folders, and lists each group of duplicate files it finds.

It works by recursively walking all of the specified drives/folders, generating a 160-bit checksum for each file, then finding all groups of files which have matching checksums.

So, e.g. if you've saved it as a script named dupfiles, the command:
dupfiles d:/ e:/would find all duplicate files on your D: and E: drives.

I love little scripting puzzles like this... and it is also an excellent illustration of why I install Cygwin on all of my Windows boxes, and why IMO everyone should learn how to use UNIX-style shell commands. You can accomplish a whole lot with very little code.

The years just pass like trains. I wave, but they don't slow down.-- Steven Wilson

I'm surprised that the tools give false positives; if the length of the files match, the tool should then do a byte-for-byte comparison of the contents to verify the match.

While false-positives are theoretically possible with a checksum-based approach like the one I gave the script for above, the odds are mathematically so low (it's a 160-bit hash, so the odds of getting a collision are vanishingly small) that practically speaking you'll never see one.

The years just pass like trains. I wave, but they don't slow down.-- Steven Wilson

just brew it! wrote:I'm surprised that the tools give false positives; if the length of the files match, the tool should then do a byte-for-byte comparison of the contents to verify the match.

While false-positives are theoretically possible with a checksum-based approach like the one I gave the script for above, the odds are mathematically so low (it's a 160-bit hash, so the odds of getting a collision are vanishingly small) that practically speaking you'll never see one.

Yup. I'm not sure what method ACDSee uses. It may only be a 32-bit checksum.

The other app the did give more than ACDSee did I can't remember the name of now. I think it's just called "DupFinder" or something.

I love ACDSee's dup finder; it has great options for auto-deleting the dups it finds...very few other apps seem to have that ability.