I read through your goals, and re-read them again... Doesn't SpiderOak.com already fulfill all of your bullet points? It does local compression and encryption, de-duplication (strictly confined to only your account), it allows control over its backup scheduling (from immediate to whenever), and unlimited historical versioning.

It's cross-platform: I've been using it under Linux and Windoze for maybe a few years now. I also set it up for a local SMB to backup all their private data that the Feds require them to safely keep.

It will also optionally synchronize directories with an unlimited number of other computers within your account.

It's the only service that handles the privacy issues correctly: local strong encryption where the keys stay with you. (Wuala might be a distant contender.)

That's different. They're selling a service. I did not want to purchase a service to manage my backups. I want to be in total control of my backup solution.

Pug is just the raw "backup engine". It has a module for storing backups in Amazon S3, but that module could be replaced to work with any cloud storage service or even a corporation's own cloud storage systems.

Pug might work cross-platform, but it was written explicitly for Linux (and other Unix-like systems) and can back up any disk storage that is accessible to the Linux machine. That might be local disk storage or remote storage mounted via CIFS, NFS, or other mountable file system.

Pug is for the IT department that wants to completely manage their backup strategy. The software has lots of nerd knobs for controlling detection of new and modified files, as well as scheduling storage of those files. Pug can use a single database and cloud storage repository for the entire enterprise (or department), or it can be installed such that each installation uses its own database and back-end storage. There are pros and cons with each approach, but the flexibility is there.

I have a dedicated machine that runs Pug and backs up all files from multiple "locations". There's really not a lot of load on the backup server, but it scans mounted network locations, schedules files for archival, and then archives them. Since Pug never stores the same file twice, this approach helps to reduce cloud storage costs when archiving multiple NAS servers and when there are duplicate files. (I see that a lot.)

In any case, Pug is not designed for the typical Windows or Mac consumer. Rather, Pug was designed for the experienced IT system administrator that wants to have total control of cloud archival of corporate data.

Pug could also be turned into a useful service where one sells a physical device with Pug installed on it to business customers. This box would need to have a GUI for the non-technical users, of course. It's a nice solution since one does not install software all over the place to perform backups. Rather, one utilizes an enterprise NAS server and this small device to perform backups. (This would be something exactly like what I use today, though I don't have a GUI.)

But I do have a problem that isn't all that different from the source maintenance problems of a lot of people, my guess; And I need a solution.

Okay, so I have a really big directory, wide and very deep. With lot's of duplicated and near duplicate files. In fact at one point I had a disk error of some kind (I use an encryption tool, the actual disk platter is the proverbial "bag of bits" and as a consequence of this disk error some elements of the directory tree are repeated maybe a dozen times.

And I want a program to go through and make wise choices, wrt clean-up.