With me, nope. For example, one book is fine, the other has ugly footers about PDFCompression. Therefore, no results found, even if you check 'Ignore Content', which I find strange.

Its not looking at the content its looking at the file size and checksums - it gets them from the file system directory not from within the file. If just one byte in a file is changed, added or removed then the size and checksums will change, and two files otherwise identical will not be regarded as duplicate files.

Do you have opf files for your 100,000 books? And what OS are you on - Windows, OS/X or Linux ? There may be a specialist product like the one I have for image files - but I wouldn't hold my breath.

If it were me I'd bite the bullet and load them into Calibre. I would do it in batches, once calibre has an author & title database, I think you could delete the format files as I don't think they're needed by Find Duplicates... unless you're planning on doing a binary compare, on 100,000 books that could take quite a long time.

Then you search for lines in listofbooks.txt that have the same checksum entry - they are probably, but not definitely, duplicates. If you want I can make this more robust and automated - it's what I'm going to have to do myself, but not for a week or so. If you want I can post the script once I do - but it will be a Unix/Linux script.

There's probably a way to do it using PowerShell too, but my PowerShell skills aren't that good yet.

Hmmm.. Thanks much for the vital information you all. When I remove ebooks from the Calibre, since it can get literally swarmed, I 'Delete everything' after saving to disk.. Even .opf files, sadly, which I'm guessing are needed for dupe finding.

If I were to delete everything EXCEPT the .opf files, will it clear my Calibre window of books and yet save the library info for the next batch of loaded books so I can find the dupes, or how would that work?

By the way, I have Windows 7 64bit on both my laptop and my PC.

Also, I use Dupe Cleaner, but sometimes it won't work correctly, ie, it won't find the proper files.

I mostly use Anti-Twin, which is a two-edge sword. While good, there are significant issues.

When searching Similar files, they give you a ratio. Compare:

Ratio set for %100

The Hollows 01 - Dead Witch Walking.epub
Dead Witch Walking.epub

Results: 0 Duplicates found.

The Hollows 01 - Dead Witch Walking
The Hollows 11 - Ever After

Ratio set to %90

Duplicates found: 3

The Hollows 01
Dead Witch Walking
The Hollows 11

Yeah. See my issue? With this method, Series 1 and Series 8 are similar books, while Dead Witch Walking.epub and The Hollows 1 - Dead Witch Walking.epub are not.