Large collections of pictures frequently have a lot of duplicates of varying qualities. In some of my folders I have found as many as 15 copies of the same image. In my Jayne Mansfield folder before deleting duplicates I had 10,000 images. After the delete I had 5000. When browsing a folder with a lot of duplicates you see the same image repetedly and some images you seldom see at all because you dont get that far into a large collection before you get done browsing.

When they are the same file size or similar name listing as thumbnails arranged by size or name places the similar sized or named images next to each other making them easy to spot. Deleting duplicates with varying sizes and names can be an almost impossibly time consuming task. These are some techniques to make the task faster and more reliable.

The first thing is to sort the collection into a broad easy to spot criteria like bikinis, one piece, costume etc. If you have more than one or two hundred images in some criteria you should divide it into more criterias because it takes too long to scan through much more than that to find the duplicates and you get fatigued and miss them. You might divide the bikini collection by those with water in the image or grass. You will probably put the bikini images back into the bikini folder after the deletion of duplicates but wait untill you have worked the whole folder. You should make up more criteria and if you invent some that work for you share it here. I found that some criteria don't work so good. You wouldn't use some thing like braclets because that is too hard to see in thumbnails. You might think that sorting by how the picture is framed, such as bust, medium or full length, might work. It doesn't because you may decide a "bust" image is a medium image some times when you see it and then you still have a duplicate. Or someone may have cropped an image to make it a bust while you have a better quality full version in medium frame. Also you wouldn't want to sort by color and black and white because you may have a great color image and don't want to keep a black and white version.

So here's the technique using bikinis.

Go through your base folder and move all bikini images into a temporary folder. I put all bra shots in too as it's too hard to see the differance between a bikini top and a bra in a thumbnail. Just clickity click right through the folder as quick as you can spot the bikini. If you still have over a couple of hundred bikini images in the temporary folder it's too many to start deleting. For Raquel Welch I had 1200. Move all the images with water in them to another folder I call that one "sort". This got me down to 500 to 600 bikini images with water and without. Still too many. I sorted into yet another folder I call delete. Here I put images that are very specific like a specific bikini or photo shoot or standing, reclining, sitting or kneeling. Now you may find that you have few enough images, less than 150, to begin to delete.

As you do this preliminary sorting the duplicates will become more and more visible. You want to resist deleting the duplicates with over 150 in the folder. They are visible in a quick scan but you still have to spot them individualy amongst a lot of other images that slows the technique down.

To delete;

I use ACDSee for all this. You may have to adjust your technique to fit your software.

Scan through the final delete folder and while holding the control key click on the duplicates you spot. Select the ones you have a lot of first to get them out of the way so later you can quickly spot the duplicates you have fewer of. Then view the selected files as a slideshow that you can advance image by image. Here you decide which ones you want to delete. I have found that I still keep some duplicates because I may have a good quality image with a water mark or that is cropped and a lower quality clean or full image. I'll also keep artisticly altered duplicates like colorized versions.

I'll mention here that you should check the file size because sometimes you may see two copies that look identical but one may be more jpeg compressed than the other or one may be a larger file size but still be intentical with no jpeg compression artifacts. in these cases simply zooming in with zoom lock and switching from one to the other may allow you to see which is the better image. Sometimes you might want to copy the images into an image editor to compare.

After deleting the duplicates move the images you want to keep into the final bikini folder. This gets them out the the way to find more duplicates. Then go back to the delete folder and look for more duplicates.

Don't do preliminary sorting into broad criterias all at once. Do one broad criteria, then sort that into a sub criteria, then a sub sub criteria, then delete and work your way back up the criteria chain to the base folder. For example sort into bikinis-water-standing-delete. Then do bikinis-water-sitting-delete. Then do bikinis-water-reclining-delete. After you delete all the duplicates of bikinis with water images sort for the bikinis-grass-standing-delete, bikinis-grass-sitting-delete.

Well thats all for now. If anyone thinks of additions, sorting criterias or some area where this essay could be improved or made clearer please post.

Some additional criterias I use;

costumes
street clothes
indoor
outdoor
single
group
color of clothes

Also when posting sub criteria mention the main criteria and or sub sub criteria formatted eg

no wonder that you have so much work to do before you have the slightest chance of finding dupes.

the first thing i would do in your situation is organizing the images properly. having every image of the same model in a single folder isn't really helpful to get along with a collection. sadly this mess is often based on unorganized guys who share their random images in single posts / files. my recommendation, if you are a collector: do not load such posts entirely, may pick some single ones you really like to have. but for a model based collection it's totally pita.

keep images together which belong together. from same TV show, same gallery, same mag story etc. keep them together in a folder, create a shortcut to 1 of these images and store only the shortcut in Model XXX gallery folder. then it's much easier to stay on track with shootings you already have - entirely or not.

__________________Always the same pack, sunshiny dandering around like a bitch in heat to the presenters..

the first thing i would do in your situation is organizing the images properly.
keep images together which belong together. from same TV show, same gallery, same mag story etc. keep them together in a folder,. then it's much easier to stay on track with shootings you already have - entirely or not.

That works good for professional models with several dozen images of any given shoot and several dozen shoots in total. There are many filing methods and many that work better for varying purposes such as finding a particular image or simply browsing.

But for Jayne Mansfield and my other favorite movie stars there are too many differant "shoots" with too few images in each. It makes a mess of dozens, if not hundreds, of folders. Worse than combining shoots of a few images into criteria folders of a couple hundred images. With general movie stars, most I only have a few of any given one, and most I wouldn't recognize the name or what is in the folder. It is more efficient to combine similar stars. For instance I combine Yvonne Decarlo with Elvira, Tura Satana and Elizabeth Taylor. I combine Barbara Nichols with Diana Dors.

Also there's the problem with naming a multiplicity of folders and then remembering the name when looking for that folder, or what's in it. For a distinctive costume it's easy. Like Raquel Welch's stars and stripes myra Breckinridge, 1,000,000 BC or magic christian costumes. I put all 233 1,000,000 BC costume images into one folder. 356 of the rest of those costume images into a single costumes folder. Not so easy for street clothes, glamor wear, or all the different bikinis and swimsuits.

While filing methods and priorities are related to sorting and finding duplicates this thread is merely focused on finding duplicates on your own hard drive. Not at all about finding duplicates on VEF.

I use to upload my collections with an uploader-tool named "Irada". Before you cry loud and say that this appears to be off-topic...yes, at first sight perhaps.

Along with this (stand-alone) tool comes a smart sub-function which allows to scan folders for dupes. It detects them by scanning the MD5-checksums and lists all pics having the identical one which is the true proof of a real dupe.

Unfortunately, it´s a tool being developped in Germany and the creators failed so far to publish their program in english language as well.

But anyway, here is the link (as I have said, no installing required and no registry-entries, it can also be started from a USB-stick etc.)

Code:

bihe.berlios.de/page/?loc=irada

__________________

To view links or images in signatures your post count must be 0 or greater. You currently have 0 posts.

Along with this (stand-alone) tool comes a smart sub-function which allows to scan folders for dupes. It detects them by scanning the MD5-checksums and lists all pics having the identical one which is the true proof of a real dupe

I thought there were many duplicate finders available. I use one called 'Easy Duplicate Finder' which is fast and lets you set convenient parameters, including min/max file sizes. It must have saved me hundreds of GB in wasted space over the years

The trouble is, it only detects exact matches - it doesn't matter if filenames are different, but the files themselves must be exactly the same. So if you have 2 pics/movies/files that are the same, but different sizes, it won't find them (unless there's some option I've overlooked)

I use this. It's free & very easy to use. It will detect the same image with different file size, resolution & file name or vary the level of similarity right down to at least 80%, beyond which even a blind man could tell the difference. You can tell it to look within just 1 folder or between more than 1 sub-folders within a main 1. For a massive folder just leave it running in the background. You can automate it to auto-delete or check each dup before deleting. Saves a hell of a lot of time for me.

Just an update on some statistics on duplicates. It's turning out that I'm averaging 1/3 to 1/2 of my files are duplicates. I have just under half a million image files. It's still a large task but it is managable now and I am making progress. And I get to see many files I haven't seen in years. Probably would never have seen them again and wouldn't recognize them if I saw them on the internet so might have downloaded them again and got some more dupes.