my parents desktop backing up photos/documents/etc over the internet to my linux box

my windows desktop backing up photos/records/documents to both the my linux box (which it was sitting next to) and the two external hard drives (one of them kept off-site)

This worked ok, but it had issues. The first (and biggest) being that the upload through my parent’s cable modem could make my adsl connection unusable, so the first change was to pay for a subscription to have my parent’s back up to the cloud. The initial sync of around 100 GB took a while but then worked nicely. (works even nicer now that they have NBN…)

The only issue I had with the linux install was when it failed to update itself. I don’t know how often there were updates, but I know that four times it would get into a loop of downloading the update, failing to install, downloading the update, etc. This would continue until the 10GB root partition would fill up. Each time I would delete the update files, open the app and manually trigger an update.

The pair of external hard drives started out at 500GB, were replaced by a pair of 1TB drives, and then at the end of last year replaced again by a pair of 2TB drives. I keep taking photos, at 25MB per raw file it adds up.

I am also slowly getting around to replacing the current linux box, so I am getting rid of some things to simplify matters. The two external drives have been sufficient, the only trick being remembering to plug one of them in (usually after coming back from taking photos somewhere), letting it sync and then swapping it with the other one which I have been keeping at work.

So now I am being forced to change…

My parent’s desktop is going to be easy as a quick look around shows that backing up a single computer to the cloud is the common use case. Their current subscription expires in February, so I have six months to find an appropriate plan.

Backing up my data isn’t as straightforward, largely due to the quality of internet here in Australia, the reason that I have stuck with a pair of external drives.

My ADSL connection is better than it used to be (almost double the speed and reliability once a tech redid the connections in the junction box out on the pole…), these days around 7000kb/s down and 800kb/s up.

I have 1TB of photos, so over four months for the initial upload to a cloud service. When I take photos I come back with a lot of them, they get culled but the backup is before that. I can easily take 400 photos in a single day, which is 10GB or more than a full day to backup. I can’t see how a cloud service works for me with ADSL.

(But what about NBN? It is available to all of the houses around me, but I’m in a unit which appears to be being left until later… even then I don’t know what tier I am prepared to pay for. I should be able to pay the same as now for 25/5 or $20 more each month for 100/40, these work out to be 5 hours and 30 minutes for that “day” of photos)

As I start to look for a replacement I have been thinking about what I want:

local non-cloud and free

specify folders to monitor

target is multiple external drives

service that detects when external drive is connected

revision history (not just a basic sync, can recover deleted files)

data deduplication (no duplicates when files moved around or renamed)

I have a year until CrashPlan stops working, I wonder what I will find…

I don’t often take photos on my phone but when I do they are automatically backed up to Google Photos. As well as it being a backup it comes in handy every so often as it is a way to access the photo without copying it off the phone.

Every so often I clear the photos off my phone, incorporating some of them (usually just for future reference) into my photo collection (but not the section that Lightroom looks at). While the photos are no longer in the DCIM folder they all still exist as the backups in Google Photos, which I don’t want. When I look at Google Photos I prefer to only see the photos that I have shared in Albums.

Annoyingly there doesn’t appear to be a way to see all photos not in an album (this is trivial in Flickr and I am not going to get into the future of Flickr now…), despite it being a feature requested by many people over the years.

I have found an alternative that is also the better solution for my actual problem, you search for the following:

#autobackup

This is an undocumented feature (one of probably many) that does what it says…

As I said at the time the primary purpose is as a file server. Both as a target for backups and for media. The media becoming more important than before as I will have a television in a lounge room to play it on.

The specific component of WHS that I wanted was called Drive Extender. This is a storage solution where you add the hard drives to a storage pool and you then define folders within that pool. A folder can be set to duplicate its files across multiple drives for redundancy, or to just keep a single copy. Individual folders don’t have limits, whatever free space is available in the storage pool will be used.

In contrast to the other options, where you had to preallocate space and at the beginning set the redundancy level, this is just so flexible. Running out of space? Add another drive. Case not really big enough for that extra drive? Mark one of the smallers ones for removal it the data is copied off it.

So, without WHS I am back to Linux. What I am now planning is an updated version of my current server. This means an LTS release of Ubuntu and I will continue to use LVM for the disks, but with a difference.

Currently I have two drives striped for media, then a partition on the primary drive for backups. The current sweet spot for drives is 2TB, so I will get two of them with 500GB of each setup in a mirror for backups, then the remaining space striped for 3TB of space for media. That should be more than enough. For now at least.

The aspect of this that I haven’t finalised is how I setup the mirror. I can partition the drives, RAID them and then setup LVM on top. Or I can just use LVM for the mirroring. Further investigation is required.

In addition to storage I also still intend to use this box for recording broadcast television. While I am quite impressed with DV Scheduler, it is no longer suitable as it runs under Windows. While I have yet to look into it, I suspect that MythTV backend will be the solution there.

The other feature of WHS that I was interested in was the ability to perform a complete workstation backup to it. I can continue using my robocopy based method as I know that it works, but that is only backing up data. If I have a drive failure I will need to spend a non-trivial amount of time to reinstall. But I have time to investigate other options (including the backup built into Windows 7).

I will continue to write about this (in between house stuff) but I actually need to act fairly soon as I have been out of space on the current server for a week now. It’s not good.

For a long time I have been using Microsoft’s SyncToy to backup data on my Windows boxes over the network to my Linux box. Every few weeks (in reality it was months) I would also use it to copy that same data to an external drive for the off-site backup.

Not any more.

When I first started using SyncToy I was satisfied that it was copying all files. Recently I discovered one of two things: back in the beginning I didn’t check properly, or the behaviour of SyncToy has changed since then.

So what is the problem?

The SyncToy setting I have been using (at least on the recent versions) is ‘Echo’ which is described as:

“New and updated files are copied left to right. Renames and deletes on the left are repeated on the right.”

At face value this is what I wanted, a mirror of the local files to a network share. Unfortunately I didn’t take this description literally enough, SyncToy will ONLY echo changes that are made on the ‘left’ side. What I need (for example when rotating through external hard drives) is a proper sync that analyses both source and destination to determine the differences that need to be copied (you know, like rsync).

So if for some reason files on the destination (‘right’ in SyncToy terminology) go missing or get corrupted, SyncToy doesn’t care. In the case where I am using a pair of identical external drives that I swap between home and work every couple of weeks, data that is copied to one drive is then not copied to the other drive a few weeks later.

What really confuses me is a step that the latest version of SyncToy no longer performs, which is how I noticed this (and then found that many others already knew). It used to be that when the sync ran (immediately after login) I could see it walking the destination file tree, both via network activity and in the samba logs. Why? If SyncToy doesn’t care about the destination, what is the point of this scan? Obviously they figured out that it was redundant and it was removed.

So what have I done?

Ideally I wanted a realiable win32 port of rsync that didn’t require me to install Cygwin. But without that I started looking into alternatives and I settled on Robocopy. Yes, another tool from Microsoft. For XP it is obtained from the Windows Server 2003 Resource Kit, but it is standard for Vista and 7.

Robocopy is a command line tool (there is a GUI available) which is fine with me as I want to script it. Which I have done and I now have two scripts. One to run at login which backs up local data across the network, and a second script which backs up the same data to an external hard drive. This second script also pulls other data (such as my email, etc) from the Linux box to the external hard drive.

One important option that I need to specify is /FFT which tells it to ‘assume FAT File Times’ as apparently the FAT file times are not as accurate as you would expect. But I’m copying from NTFS to ext3, FAT or FAT32 is not involved, but in between those two file systems is Samba, whose SMB implementation has similar time accuracy problems as FAT.

It has now been a week and the backups are working correctly. Hopefully it stays that way.

In light of the failure experienced by two prominent technicalbloggers I am glad that over the past few weeks I have been gradually improving my situation in regard to backups by finally crossing a few items off my Todo list.

So what have I done?

Firstly I now have a daily backup of my hosted sites (this one, plus those of some friends). Although I assume that Dreamhost have some form of backup and redundancy, Phil and Jeff learned the hard way that you can’t necessarily trust your host. So in addition to a daily rsync that pulls down all of the hosted files (mostly wordpress files, but also any new images) I now have a server side cron job that dumps each mysql database to a date based file. These mysql dumps are included in the rsync.

It was only last night that I tested these files. From the starting point of a generic Apache with an empty htdocs and an empty mysql database, I was able to copy in the files and import the database. It all worked.

However this is only bringing the backup of my site onto a system that I control. What about the backups of that system?

This is where a pair of external USB drives comes in. The plan with these is to alternate every couple of weeks (at the most) these between home and work. What I have been working on is an automated method to get the data onto one of these drives when it is connected to my windows desktop.

Why the windows desktop?

Because the bulk of what I am backing is 110GB of photos. While these are incrementally synced to my linux box, it is faster to sync them straight from the source. But this is causing some issues with backing up the linux data.

My mail is stored in Maildir format, but when that is copied over windows doesn’t like the file names so they get garbled. So technically I should still have the message content, but I wasn’t sure. So instead I am going to create some archives (tar.gz or possibly rar so I don’t end up with gigabyte sized files) that are then copied over the network.

As this is still a work in progress I expect that the details will change.

I have all sorts of data that ranges from private data I need to keep (emails, document, financial records) to public data that I don’t care about (dents and tweets). In between is data that I care about, both private (family photos) and public (photos for competitions or that I have up on Flickr).

I have two rules:

If the data is private I try to store it at home (with appropriate backups) instead of on a remote service.

If I care about the data I make sure that it is stored at home, or if stored in the cloud I have a backup.

The first rule is why I still run my own IMAP server instead of shifting it out of the country to Google or similar. The second rule is why I still have all the originals for my photos that are on Flickr and why I have nightly cron jobs to backup this site, my delicious bookmarks, etc.

Just before I went to OSDC I moved the contents of my Inbox to a new folder so during the conference I only had to worry about anything new that had come in. My first attempt at applying the concept I have seen referred to as ‘process to empty’.

This worked well and I ended up using it as a place to store anything that came in during the conference that I wanted to deal with when I returned home. Which I did.

However, once I had dealt with all of the recent items I accidentally deleted the folder. This meant that a couple of emails that had been hanging around in my Inbox for a long time were gone. And I still needed them.

Six months ago I had burnt a backup of my photos and documents that I was storing off-site (aka at work) so today it was a simple matter of grabbing the appropriate disc, extracting the archive of my home directory, and picking out my Inbox from that. Then when I got home it was a matter of dumping the files in the appropriate directory (under Maildir) and looking at the messages in Thunderbird.

Now, the data that I lost wasn’t particularly important, but I do need it in order to follow some things up so I was thankful.

Something I need to improve is the interval. The off-site backup is 6 months old. While my other backup is a nightly rsync that gives up to 24 hours. I have been meaning to use an external hard drive which could give a number of week intervals.

Now, thanks to Zazz! I have found an external case with a built in power supply that actually seems to be available: a Sarotech Hardbox.

However, there is an interesting issue. The price.

Zazz has the drive case and a 400GB Samsung hard drive for a total of AU$169.90 (and AU$12.90 postage).

At the local computer parts places a 400GB drive currently goes for around AU$130.

So that would be another AU$40 for the case which is what I have seen at (the now defunct) swap meets for the ‘one touch backup’ external cases.

However a quick search online for places in Australia selling the Sarotech Hardbox brings up prices of at least AU$90. It actually makes the Zazz! deal tempting, although I do prefer Seagate or Western Digital over Samsung…

At least I know know that there is something available. But I probably wouldn’t get one unless the price is below AU$50.

Update: Further searching turned up the case for AU$47.50 and AU$10 shipping. I should get onto a friend and see if he can get it wholesale…

A few weeks ago Thomas Hawk posted about using external hard drives to back up photos. The post and the comments that followed provide a lot of good ideas and advice, but none of them address a fundamental issue I have with external USB drives:

They use an external power supply.

I have problems with this:

The power supply is an additional part that must be carried with the drive. This reduces the convenience of the drive unless there is a power supply at each location the drive is to be used.

The pins on the power connector are too fragile. Between myself and people I know there are at least a half dozen times where a drive has become useless because the connector or the socket became faulty.

The power supply adds to the clutter if the drive needs to be connected for an extended period of time.

A few years ago, before USB, the option for external drives was SCSI and those cases came with internal power supplies. Simply connect an IEC power lead and the SCSI cable and the drive was ready to go.

Why can’t that be the case for USB cases? You could transport a single item which could be used anywhere that had a standard power cable and a standard USB cable.

I can think of two possible solutions which both involve sacrificing a USB drive case:

Fit the hard drive, USB interface and the (previously) external power adapter inside another case.

Fit the USB interface inside a SCSI hard drive case in place of the SCSI connector.

For now I’m just going to keep my eye out for cheap SCSI cases on eBay.

Since Monday evening (my time so at least 48 hours ago) this site has been unavailable.

Q: Why?

The server doing the hosting seemed to drop off the internet.

Q: Why?

Good question. As the site for my hosting provider was also down I decided to take the optimistic view that the situation was being recified and it would be back up shortly.

Q: Was it?

No. I tried again later that evening and the situation had not changed. I was getting more concerned but I decided to stay optimistic and see if it was back up the next morning. That turned into the next afternoon and I was then kicking myself as I realised that it had been a couple of months since I had last backed up these blog posts and there were settings that I had never backed up.

Q: So what now?

I jumped ship. Although the hosting had been ticking along without any issues I had considered it strange that they didn’t seem to care that they had not billed me since October 2005, ie more than six months ago. What company doesn’t care if its customers pay or not? Becuase of this I had looked into other web hosting providers a month or so earlier so it was simple to sign up with the first name on the list after I couldn’t even access the current provider by phone (voice mailbox full!). It was then a waiting game while the new DNS settings for my domain propogated.

Q: What about the data?

Initially I thought that I had only lost a little bit of data but as time flowed along I realised that I had lost enough data to be inconvenienced.

I run a copy of the site on my box at home so that I can test any changes before releasing them so I knew that all of the files were intact. It would be a straightforward matter of uploading them to the new host. This meant that my computer collection area was intact as that is contained within files or brought in from del.icio.us (which I backup via a cron script that uses the API to grab a dump of all my links once a day). The photos section was also ok as it pulls the set details from Flickr.

This blog was a different story as these posts are primarily stored in a MySQL database. Every so often I copy the database over to my local MySQL instance but the last time I had done that had been at the end of March. Almost three months ago! Fortunately Google’s cache came to the rescue and I was able to obtain the text for all of the posts I had made since that time. One item on my list is to setup a mechanism to automatically backup the database, a quick search showed that there was at least one WordPress plugin that could periodically email an export of the database.

Losing all of my email forwarding settings means that my spam strategy has taken a big hit as I will have to regenerate the list of valid addresses and again monitor my gmail account which will be the target of the catchall rule.

Q: When will the status-quo be restored?

Not until after the upcoming weekend. So far I have uploaded the files to the new web host and the DNS setting have propogated. Until I sort out some differences in the configuration of this host to my old host all I have running is this blog (how else can it be read?)…

On thursday night midway through a long overdue backup of my personal files onto dvd my Pioneer burner decided to just stop responding. It can read and burn CD’s fine but doesn’t even want to detect that a DVD (of any type) has been inserted. It is ironic that I had had the drive for one year and a day…

That said I went out today (after my golf lesson) and picked up a new one, this time a Pioneer DVR-109. At the same time I picked up some new RAM to take my main windows box up to 1GB.