dup is a small package I created for my Synology NAS. Having it loaded with 500Gb drives I soon ran out of space, and having additional drives lying about with more data on them so I decided to fork out and upgrade the DS2411+ to have 12 3TB drives and move all my data I’ve collected over the last 10+ years into one central location.

While in the process of moving my data I noticed I had the same Files/Folders on different drives but copied them over anyway and thought I’d run a program to find the duplicates and then remove them. While moving the data I noticed that this will take a long time, even with a GB connection to the system, and a computer would also be required to run the software from. This is where Dup came from!

I am (at the time of this post) working on another project called nzb (might be renamed) which I grabbed some basic code from and as my data is more important, and I decided to complete the dup project so I can start using it. I also thought I’d share it. It’s not perfect by a long shot, but it’s quick and light and fulfills my needs. Below will be some basic information about it and how to use it, including it’s features.

dup is written in php and requires an additional package for working on your Synology system, you can get it here.

dup also requires you to make some changes to the way php is configured on your Synology device.

There changes are, adding /volume1:/volume2(etc) to your open_basedir

This is required for dup to scan the folders for files. you should add all other volumes you might want to scan with dup.

safe_exec_dir must also be disabled.

This is required for dup to be able to use the ls -lRe command on your system to list the files and folders, also required for the unlink and rename functions for deleting and renaming/moving files.

Once you have setup the correct PHP setting and installed dup your main screen will look like this until you have set it up.

Here you need to run the setup script.

After running the Setup script you have 3 fields to enter.

Password : for user root on MySQL

Default path to start searching, eg /volume1/Films/

Default folder to move duplicates to, eg /volume1/Data/Moved/

Note, these 2 paths must exists! If they don’t, defaults paths will be entered simulare to above.

Once your setup, the basic interface is as below, settings is always at the top, here you can make changes, add new profiles and down the bottom the home link and search function.

Here are a few things to keep in mind when using Dup.

Don’t have the folders to be scanned too large (if around 100,000 files I’d break it up, eg, /Music/Rock, /Music/Pop instead of just /Music), although dup can gather the file listing quick enough, it’s the MySQL operations that take the longest amount of time due to the MySQL being limited to 24MB. (If you know a way to change this like you can with some php options please let me know)

I usually scan (check Scan path for files and save settings) with CRC32 disabled first, reason for this is the matches will be based on the file size first, not very accurate but gives you an idea of how long the next step will take. Once you have gotten your file list (check CRC32 and don’t check Scan path for files and save settings), this will then only perform a CRC32 update on the found files that match in size. This can take up to 5 minutes for about 1000 matches (2000+ files to scan).

You will always know when the script has completed as the settings block will always disappear on a page load! theres no loading times or progress indicators at current, maybe something for an update though!

The CRC32 is not a complete CRC, hence why it’s so fast at computing them. The crc is made up for 3 stages, the first 150 kb of the file is scanned, the middle 150 kb is scanned and the last 150 kb of the file is scanned (up to 2GB in size, php limit!) and the 3 crc’s are added together and that string is crc’d again to give the final CRC32 result!

The minimum file size that can be scanned is 1MB , there’s no limit on the largest but the PHP functions can only read 2GB into the file on 32 bit systems. I messed about and tried to get this to work for large files but decided to leave it as is, it will display large files sizes without any issues though.

There are default extensions that you can change, or set to the found ext’s after performing a scan (Update option). Setting this blank and saving will load the defaults again. These are the file types that get entered into the database only.

When you click Add profile it will copy the current profile settings, so you will need to select the new profile again after the page reloads (I may change this to automatically select it for you in a later version)

Down the bottom of the file list are 3 options, to check all duplicates found (based on file size or crc, depending on how your viewing the list in settings) and a delete and move options.

There pretty self explained but the delete option removes the file for good, and does not move the files to the recycling bin (again this may change in a newer version, if I can find out how other then renameing the file to the bin).

You can search the current profile only or select the Global option to search all profiles. Results are displayed showing case matches and non case mattches, also aranged by either size or crc32, depending on the profiles the search was performed from.

Also note, there is no security check in Dup, so if your nas is used by lots of users it might not be a good idea to install it (they could delete duplicats files found!), this will be added in a later version.

php vars limit code updated to display under 1000 items total instead of limiting to 500 matches as this could still go over and casue issues.
action.php updated to rename files if already in the Moved to folder, skips code if action var not sent
setup.php code updated to make sure set folders exists better (without the ending slash)
die.php updated to also display php last error aswell as mysql’s
search.php updated to display file size with comms
few code tweeks here and there.

4/12/2012 Updated to version .4

Updated the fsize(), crc_file() and UpdateCRC() functions in dp.php, possible chance of returning wrong size for files, also hashing function updated to use hex values instead of unsigned ints and sql statement for updating crc’s updated.
Few display updates here and there.

Feedback

Comments

Thanks for your package, I installed it on my synology and realized that it would actually move duplicate files, but I need to have a choice before moving them, so I removed the package before even using it. Is there a way to just show the duplicate files on my synology and than decide whether to remove it or not?

By default Dup will not move/delete anything, it just logs the data to a database and will then display it. When you display the data you will then have options to either move or delete selected files (bottom of the page to the left), of course you don’t have to, it’s entirely up to you whether you want to do this. By default nothing is select in case you accidently process a Move or Delete command.

I tried to run your script on a DS211j with DMS 4.0, but I’m getting a little confused about it. Searching without entering a search term seems not to work at all, ending up in an empty frame with not even the headers and webpage/home links to appear. Seaching with a search term ends up in an empty page with headers and links.

You stated

You will always know when the script has completed as the settings block will always disappear on a page load!

, but I don’t what you really mean. I can’t see any results, and I can’t see anything running.

Would you perhaps mind to give us a walk-though tutorial? Maybe just screngrab the steps and put in YT?

Yes, seems an empty search string is resulting in a blank page, I’ll fix this in the next release.

Note if the search string is not found it will just display the column headers and the bottom links, I’ll update this also so it includes a message about no results being found.

With regards to the settings, when you click the settings, a section expands to display more content without reloading the page, so when a script is run that causes a page to look like it’s frozen, when it finally completes it will reload the page but the settings section will no longer be expanded. This is just to let you know that the script is complete. (allow popups, for the report window)

As for the tutorial, yes, I have thought about this but just don’t have the time at current. I’m planning on releasing a package called NZB and will then update dup to address the above issues and some others, and will at that time make some videos on it’s functions.

In the mean time, here are some steps and things to keep in mind when using dup.

Steps for now, video to follow:
When setting up dup, make sure the paths exist, this is very important!!
Also make sure you have your DSM php settings matching the examples images.

Entering /volume1 will cause issues so make sure you select a folder within /volume1, like /volume1/Films or /volume1/MovedFiles. It’s a permissions thing and I don’t want to start changing them.

In order to get a list of dup files select Scan path for files and click save settings. The option will reset it’s self after the script has run. This option will enter files found into the database or if there are already files in there, it will delete them and enter a fresh list.

Doing the above will display files based on size only, you can go back at any time and click CRC32 search and click save and it will update the files in your current list to only display files matched via hash. Note you can check the CRC32 Search and Scan path for files at the same time and when you click save settings it will only display files based on their hash by default.

when you run a scan, it will update the file types field on the page, but not in the database!! Clicking the Update option will update the file list in the database too and these will be used for that profile from now on.

Searching for files works when you have run at least one scan, so there is files to search through.

What i have tried so far:
– Set lowest possible security on web/dup and its content (owner guest and group users).
– i copied the admin in phpmyadmin to a new username,
– cleared its password(s)
– tried on IE chrome and firefox (with cleared cache/cookies). All return to the login screen without error.