Author
Topic: Backup Strategy: "The Threes" (Read 9517 times)

I'm trying to get back into good backup habits. I thought it might be worth documenting a new approach I'm taking.

FIRST A WORD ABOUT RAID:Note that many people use a RAID approach to keeping their data safe; I have not adopted RAID yet and still have some uneasy feelings about it, but it may indeed be a good alternative solution to part of the goals I am outlining here. HOWEVER -- While RAID does a good job of protecting your data against a hard drive hardware failure, that is only one of the goals of backing up data. The other main goal is to keep backups against accidental modification/deletion/overwriting/corruption of files, which is only discovered days/weeks/months after it occurs without notice.

Three "tiers" of backup software:

1. Full disk imaging (software example: Macrium Reflect); backup full drives once per month; scope: all content on all drives; keep a few old images, delete old image to make room.

2. Broad document backup (software example; Backup4All); performed weekly or every couple of days (perhaps even daily); scope: all user data documents, even large databases; incremental if space allows; mirrored to spare hd if not; can delete old copies if space is limited.

3. Instant versioned/incremental backup (software example: FileHamster); keeps instant copies of every version; scope: small set of important documents that are frequently modified by user.

Justification:

The full disk imaging is the comprehensive approach that backs up everything; but it takes too long and occupies too much space to perform too frequently.

The instant versioned/incremental backups are too cpu and space demanding to have run on EVERY file that you might modify (for example if you have large databases that don't version well).

The broad document backup may be best performed by mirroring directories onto a spare drive, which can be done quickly at the end of each day, occupying little extra space and taking little cpu.

Where to backup: 3 internal drivesI think a 3-internal-drive pc is the best way to go here:

1. A super fast system C drive (10k rpm, or solid state drive); 100gb or so.

2. A fast user document drive (10k rpm preferably).

3. A large backup drive (2tb preferably; speed not too important, but 7200rpm is nice)

All backups go from the system and document drive to the internal backup drive, which should be very fast to perform.

Alternatives: You could use an external usb drive for backups, but you want the connection to be fast if you are performing instant versioned backups; an alternative would be to back up from the system drive to the user drive, and vice versa, which is fine if you have enough space.

Lastly I think it make sense to keep copies the lastest full drive images on a removeable hard drive that is left outside of the house. Using a hard drive dock will let you buy 2 hds and swap them every other time with a drive kept offsite. Or you could upload to an online backup space. This will protect from fire/theft.

My routine's pretty much like that. There's redundancy but that's okay by me. It took me many years to take this seriously but I really do believe that it's essential for anyone with material that matters to them, for whatever reason.

Weekly imaging to a rotating set of alternating external HDDs (overnight) one of which is always moved off site.

When I'm working (seldom, thank goodness) it's to air-date deadlines, so this has to save my bacon only once to make the whole thing worthwhile. So far it's saved me more often than that.

Afterthought: docks for 'bare' external drives are very cheap and very useful. The few that I have provide an on-board choice between SATA and USB connections (BlacX). USB3 models are already on the market. Many of us have old drives lying around and such docks put these to good use cheaply and with a minimum of fuss. They might as well be lying around preserving data as lying around doing nothing useful.

Sounds like you are doing things right chris -- i wish i was as disciplined about keeping offsite copies, it's my one weakness.

Another thing that has changed in recent years is that drives are really cheap enough that i think it makes sense to treat them as basically write-once permanent backup media. That is, I think it makes sense to think about buying a new 1 or 2 TB drive each year to use as a backup and download repository, never erasing anything, and simply placing it on the shelf (or offsite) when it gets filled up, and then buy a new one, leaving the old full drives for emergency restoration should something really go wrong.

That is, I think it makes sense to think about buying a new 1 or 2 TB drive each year to use as a backup and download repository, never erasing anything, and simply placing it on the shelf (or offsite) when it gets filled up, and then buy a new one, leaving the old full drives for emergency restoration should something really go wrong.

Too true, I just bought a couple of bare 2TB drives for a laughably small sum.

Probably a good idea to spin up those full repository drives now and again.

If you're going just keep everything, having a couple of (a few) drives and a minimal plan (drive A, code; drive B, photos etc.) will save hunting through multiple drives later, unless you are super-diligent with your indexing.

I think both of your strategies described are good, while noting the issue that mouser mentioned of not having off-site. That being said, I do want to bring up a few issues which I think are responsible for many people *not* backing up (including myself for a long time). First, in regards to both of your strategies, I see complexity being a potential deterrent for the "average user". This ties in to my second point which is that often people spend a lot of time seeking and/or planning the perfect backup strategy before implementing. Doing this will only result in you not having a backup strategy for a long time, if ever, which is what happened to me. Having *any* backup system in place (that is *not* RAID, which is not a "backup" strategy), even an imperfect one, is better than none at all. In fact there is no perfect backup system. So focus first on just having *a* backup strategy, even if it's only a simple sync to an external drive (or an additional internal drive). You can improve your backup system easily, but waiting until you have the perfect system designed just puts your data at risk for longer.

Having made those points, here's my system in brief (molded by my own unique needs): I have 1 "workstation" machine where I do most of my work. I have 1 "server" machine that holds all my media files and is connected to my TV/stereo, and from which I want access to some of my media files.

I sync all document files to my server from my workstation over the network; since this is done daily, the regular data transfer amount is not large and doesn't really become an issue even though it's only over 100mbit ethernet - I plan to upgrade to gigabit at some point. Syncing allows my server to have access to all of my workstation's files, particularly photos which I sometimes want to display on the TV. It also is a first line of defense.

Next I use CrashPlan running on my workstation to do regular (at least daily, usually every 4 hours) backups both to a couple of external drives *and* to the "cloud" (remote data storage service). CrashPlan offers a "seed" service to get an initial large backup started by shipping a drive to you and back. Then you only need to send changes over the wire which are usually minimal and are compressed and encrypted. CrashPlan gives me off-site storage, some amount of versioning (enough for my needs), and local backup as well, all in one package. As I've written elsewhere I'm not entirely happy with CrashPlan due to its high memory use, but I now have 15GB of memory in this machine so it's largely a non-issue at this point. For others with large volumes of files it may be an issue.

The advantages I see in my system are that it's extremely low maintenance, both the sync program and the backup program run automatically without my intervention. The backup app notifies me regularly via email of backup status so I don't even really need to check it. If I don't get an email then I know something is up. I also have off-site backup without the hassle of actually taking a physical unit off-site regularly. Finally, I have direct access to all my files on my server as well as my workstation *with* local-file speeds (an issue when you're trying to enjoy slide shows of huge RAW images for example - gigabit networking might eliminate the need for this though).

The disadvantages are that I need to pieces of software, a sync system and a backup system. If CrashPlan could do both, I would be very happy and no longer mind the memory use. CrashPlan always backs up to its own proprietary file format however so this is not currently an option. The off-site backup also requires reasonable (though not extreme) outgoing bandwidth. I have a bonded ADSL2+ line from Sonic.net that gives me a theoretical 44/2 connection (22/4 with AnnexM, which I do use) and in practice I get about 3mbit/s upload speeds, which is fine for backing up my data remotely. Most consumer cable connections provide at least 2mbit upload these days so for many people this is also an option. Bandwidth caps may be an issue for some (do they affect outgoing bandwidth, only incoming, or total bi-directional?).

I also don't have system images in my backup scheme but I don't really need them and find doing them to be more hassle than it's worth. I'm ok doing a fresh install of Windows if I happen to have a system melt down, it's likely in that event that my system itself may have died anyway, in which case a system image may not even be ideal (although I know some imaging tools allow you to "retarget" an image to new hardware).

In the future when I upgrade to gigabit (just the router needs the upgrade, both machines already have gbit ports of course), I may not continue to do the syncing. I feel that 1 local backup and 1 remote is probably good enough, and it would simplify my system.

Although I think mouser's approach with all internal drives for backup is ok, it's important to consider that simple household disasters can quickly destroy a single computer system, including all hardware in it (think plumbing leak on the floor above you hitting your computer while it's on and you're not home). Something as simple as backing up to an external drive (eSATA or USB 3 are plenty fast enough), or backing up to another system in the house over gigabit LAN, provides a reasonable level of redundancy without the potential hassle of off-site. Granted if your house burns down or there's a flood you're not protected, but it does address the likely more common smaller household disasters like the one I just mentioned. Another option to consider is an in-home fire and flood proof safe which can largely replace a proper off-site backup too.

I still intend to write a follow-up blog post to my original backup post that talks about all this in more detail.

Some very comprehensive backup "solutions" here.I am one of the many who just can't get around to organising backups.I have maybe 1.25 Tb of video, music, pics & txt docs stored in two internal 1TB HD's & one external (USB) 320Gb HD .Windows system backups (providing system revision to an earlier time....?) itself continually runs out of space in the 250GB partition assigned.Everytime I think about an Outlook 2003 email backup I have to figure out how to do it (can't find an automatic procedure that works) so backup is for me something I generally sub consciously avoid because it takes too long & occupies too much space & 1TB HD's still cost around $100 which is not cheap if you need several.What I do agree with absolutely, is that for people with mission critical installations, you must have a sound regular reliable & TESTED backup routine even if it involves a significant investment. A quality home fireproof safe well sited (to avoid water incidents) is I think essential.But there are many options intermediate between mission critical & "family PC" setups.Good luck to anyone with the time to sort it out.

The way I keep backups of files is quite simple (at least as I see it):

I have a 32GB SSD with Windows + a couple games on it, and a 1.5TB WD Caviar Black HD with games, large programs, and personal folders (Desktop/Docs/Pictures/Videos/Music/Downloads) in this machine. Running on it is a copy of Jungle Disk (for file sharing between machines, file archival, and the like), Syncplicity (which I am currently searching out a suitable replacement for), SpiderOak (what I was hoping to use as the Syncplicity replacement, alas I cannot get it to work how I like), and the built-in Windows 7 Backup tool.

Jungle Disk is used to keep files that I will want later on, that I want to access easily on remote machines (via the web interface), etc.

Syncplicity is used to sync files (Desktop/Documents/Pictures/Music) between all my machines + the Syncplicity servers. It also does file versioning for 30 days, and deleted file recovery for 30 days (longer times available for paid accounts, but I'm cheap ). This keeps me safe from a drive failure, because the data will be accessible on every machine, plus on the web. After replacing a drive, I re-add the computer to the account, and the files download back automatically.

SpiderOak was theoretically going to replace Syncplicity, however when trying to add my netbook's folders to my SO account, it says that I am over-quota (even though the identical data is already in the SO account because it was uploaded on this machine). Hence it isn't being "used" at this point.

Windows Backup is set to backup Documents, Desktop, Pictures, Music and make a full system image of the SSD every Sunday at 4AM (right after Task Scheduler runs through it's nightly cleanup + defrag). I realize I am backing up files onto the same drive, but this is being done for reinstallation purposes, not for catastrophe protection (that would be mostly Syncplicity's part).

Everything else is replacable or somewhere out on the net (1and1 manages my email, Google has my Contacts, etc).

I never liked external drives, and I like the convenience of online backup. My connection's upload speed tops out at ~70kilobytes/sec, but it gets the job done. If I could just find an encrypted service like what Syncplicity offers, for what I'm paying now ($0.00), I'd be set.

I've heard this mentioned a few times here on DC and am really curious, so would someone who is knowledgeable on the subject please explain why RAID is not (or should not be) considered a backup strategy?

So if I understand you correctly, as long as you have a proper backup strategy (and if what you do isn't so critical it can't wait a few days to replace failed hardware) then there is pretty much no reason for the average/home user to implement a RAID setup.

...Unless you want the speed of a RAID-0 setup. And I would see RAID-1 as logical because when one drive fails, you go out and buy a new drive and then just re-duplicate the data and move on (no restoration process, fastest system possible). Makes drive failure as convenient as possible.

The main reason RAID is so attractive as a backup solution is that it is 100% automatic after it's set up.

So while it's only going to save you from a hard drive hardware failure, it does that job better than any other solution, instantaneously and with absolutely zero user intervention or maintenance. No other software backup solution is even close to it in terms of comprehensiveness, minimal resource use, and minimal effort.

And I would see RAID-1 as logical because when one drive fails, you go out and buy a new drive and then just re-duplicate the data and move on (no restoration process, fastest system possible). Makes drive failure as convenient as possible.

Depends on the RAID controller, I believe. Some might have built-in tools to rebuild the array, and sometimes they require you to destroy the array and sometimes even wipe the drives and restart again (I don't think that applies to RAID-1 though, just 0).

For RAID-1, you might have to use something like Easeus Disk Copy to copy the data, then re-configure the array. I've never dealt with RAID as far as maintenance/fixing one, so what I know is few and far between. And running with just one drive in the array also depends on the controller, AFAIK (you might have to destroy the array and rebuild it later to continue using the machine). Maybe someone who knows can chime in and confirm?

RAID 1 (mirrored) - One drive dies = You have and can run on second copy just fine until replacement (which you should order immediately) arrives. Low-end controllers you have to tell to rebuild/re-sync the mirrors. Commercial ones will usually start the process automatically (especially if you have a "hot spare").

The point of RAID is to either make your data access faster, make your data more failure-proof, or both.

The entire point of RAID 1 is data redundancy. While yes, RAID 1 does effectively give you half the storage space of the drives used, if one drive dies, your data is still intact on the second drive. Thus, if you have a hard drive die, it's not so big of a deal. You can just replace the dead drive, and go on with whatever you were doing.

If, however, you're willing to run the risk of a dying hard drive, there are other RAID options with more storage space. RAID 0, for instance, chops all the files you store on the array into little bits, and puts every other bit in each hard drive. Thus, you get the full capacity of both drives, and a speed increase, because you're writing two files to two hard drives, and each file is half the size of the one big one you saved. However, should a hard drive in a RAID 0 array fail, all the data in the array is lost. There's really nothing that can be done.

There are also more complex RAID options, to give both more speed and redundancy. My personal favorite is RAID 5, which uses one drive for redundancy, so out of a three-disk array, you get the storage space of two of the three disks.

Most of the backup solutions mentioned here will work fine. nice and easy. There are many ways to go about it, and they will all be robust enough. Here's where the problem starts:

Amount of data. When you're data exceeds 2TB (or 3TB now), now it gets complicated. because now we're talking spanning data across drives. Then we start talking about imaging, multiple redundancy, versioning...the hardest part in all this is running out of room on the drive. That's why I've been having a hard time adapting my backup strategy to the amount of data I have now, which is exceeding 2TB, and is going to grow pretty rapidly.

It's at this point that you start scrambling. You start changing your backup strategy, you start cleaning things out. you start deleting things that you don't need. you burn things on dvd that you feel you need, but it's ok if you lose. So most people would just buy an external hard drive, or another spare drive. but that gets weird too, especially if you already have a lot of drives connected to your pc. Then what? Then you need to re-configure your backup software to work with the additional drives. it becomes very complicated when your data exceeds 2-3 TB.

Which is why I am now going to get a legit server setup. I'll have a RAID-5 array setup for the "working" data. Then I may have another RAID-5 array to backup the working set. Then, I may also have a couple of non-Raid hard drives there that also have copies of the data. The purpose of those is that I can pull it out and use it anywhere, without having to worry about RAID setup and everything...i.e. I can just stick it in a USB external enclosure if necessary.

This setup is beyond a normal desktop, so I'm getting a server for it. it's overkill to some, but I'm sick of my shoehorned backup solution. I want to be liberated from my lack of space and do it right. Once it's setup and I have plenty of space, then I can do all the backups I want: RAID, images, file syncing, versioning...the works.

So here's another question: If you have a RAID setup, would Windows show the individual drives in My Computer, or would they all appear as a single drive?

Hardware RAID will show the array as a single disk. Most software RAIDs will do the same. The exception being if you do an in windows Dynamic Disk type RAID configuration. Then the disks will be listed seperately, but have a red partition header that is shared between them.