Posted
by
timothy
on Saturday October 10, 2009 @04:53PM
from the straw-for-the-ocean dept.

storagedude points to this article at Enterprise Storage Forum which argues that cloud-based storage options have fatal limitations for both businesses and individuals: "The article makes the argument that high volumes of data and bandwidth limitations make external cloud storage all but useless for enterprises because it could take months to restore the data in a disaster. It also appears to be a consumer problem — the author spent three months replicating 1TB of home data via cable modem to an online backup service." Seems like those off-site incremental storage firms could dispatch a station wagon full of tapes, for enough money. Update: Here's another reason, for Sidekick users: reader 1ini was one of several to point out an alert from T-Mobile that "...personal information stored on your device — such as contacts, calendar entries, to-do lists or photos — that is no longer on your Sidekick almost certainly has been lost as a result of a server failure at Microsoft/Danger."

i see cloud computing as someone with a bunch of servers owned by somebody that has run out of ideas for making money, and/or with a nose for snooping in to other people's data (i bet the government likes that - the snooping part)

Thats why I rsync my approx 12GB of data, stuff that changes all the time, nightly to another machine here in the house, and to a USB drive, then once a week, I do an incremental of the second machine's copy to Amazon S3 using Jungledisk... For what I paid for Jungledisk ($20 one-time) and the recurring costs to Amazon (usually under $2.00/mo, depending on how much more I've uploaded and the transfer/requests charges).. That way, I lose the harddrive on my main machine, the most I've lost is one day, and if the house goes up in smoke, the most I've lost is one week. Jungledisk/Amazon S3 beats the hell out of Mozy/MozyPro/Carbonite, neither of which can run on Linux (Jungledisk *can*).

In my experience, if somebody has to do backups, then backups will not be done with any regularity. It's just a fact of life.

Thats why I rsync my approx 12GB of data, stuff that changes all the time, nightly to another machine here in the house, and to a USB drive, then once a week, I do an incremental of the second machine's copy to Amazon S3 using Jungledisk... For what I paid for Jungledisk ($20 one-time) and the recurring costs to Amazon (usually under $2.00/mo, depending on how much more I've uploaded and the transfer/requests charges).. That way, I lose the harddrive on my main machine, the most I've lost is one day, and if the house goes up in smoke, the most I've lost is one week. Jungledisk/Amazon S3 beats the hell out of Mozy/MozyPro/Carbonite, neither of which can run on Linux (Jungledisk *can*).

Spoken like somebody who is truly tech-savvy. So every day you back it up nightly to another machine in the house, and to a USB drive. For 12 GB, even locally, this takes anywhere from 5 to 30 minutes depending on the average file size. So that means that about 200 days/year (if you are PERFECT) you are backing up this data. At 15 minutes per day, that adds up to 50 hours/year of tim

So that means that about 200 days/year (if you are PERFECT) you are backing up this data. At 15 minutes per day, that adds up to 50 hours/year of time spent... backing up data.

That's okay, it's not like he has to stand there turning a crank while the bits are being moved. Even Windows has the possibility of scheduling scripted events, which most likely is the method applied here.

So you have to physically open the safe every time you want to back up you data? That sounds like quite a bit of a hassle to me, and I can imagine you start skipping days because you it is too much work to open the black box, take out the hard drive, connect it, run the back up, put the drive back in the safe, and lock it back up... with an online backup, your backups can take place automatically every couple of hours without having to physically move anything.

And we have this magical thing called front-loading hot-swappable hard drives - they've been around for a decade or so, the tech is cheap and the amount I need to move is minimal. Plug in designated diff backup drive, make diff, unplug and put in safe - 5 minutes if that, no tools required.

I even have a little reminder set to tell me when backup time is getting near.

What 'airline grade black box' do you have? Because the FDR in a commercial airplane is not an empty box in which a person could put a hard drive to survive a house fire, it is purpose built. They essentially start with memory floating in fire gel and build layers of insulation around it.

General data storage safes can be had. UL or Omega rated gun safes will do as well. A safe builder near me will supply them with a 120v outlet and an RJ45 jack, so your in-safe NAS is always up to date.

A safe builder near me will supply them with a 120v outlet and an RJ45 jack, so your in-safe NAS is always up to date.

That's an interesting idea, but what do they do about dissipation of heat from the NAS? It would seem your only options are making the safe thinner, or adding vents, which would tend to compromise its' integrity.

The inside of the safe will easily reach temperatures that will destroy your media, but not destroy paper records. Some types of fire safe also contain ablative material on the inside of the safe that is designed to melt onto the papers contained inside to encase and protect them during a fire, which is also extremely bad for digital media. Unless your safe is specifically rated to handle digital storage media in a fire, it most likely will let you down.

I think the best solution would be apps and backup data on the cloud while the working data is local - in an open format like XML or something. So, if cloud company XZY goes out of business, at least you could write some scripts to retrieve the data from the local machine and convert it to another app.

Never, never, NEVER put the only copy of your data in one place. And a cloud service is just "one place."

The cloud is good for having an additional remote backup for things small enough to restore quickly (after heavily encrypting of course). Don't forget you should have offsite backups of things you really want/need to keep, in case your place gets robbed, burned down, flooded, etc.

A solution that I've heard of is storing a backup in a safe deposit box in a bank. If your data is stolen from a bank safe deposit box, you've got more problems than the missing data. Suppose that you could only really store weekly backups there unless you want to go to the bank every day. Put two hard drives in the box. When you put one in with your weekly backup, take out the one for the previous week.Nightly backups could be stored locally.

With storage stupid cheap, and computers continuing to increase in power, I just never saw the advantage to cloud storage. It requires web access. It's slow.

I just bought a terabyte drive for $100 to back up the other terabyte drive I bought several months ago for $160. Now everything is backed up in multiple. And I can access it without getting online. And I don't have to worry about my cloud storage company going out of business and taking all my data with it.

I would add to that a fire-resistant safe within another fire-resistant safe for CDs, DVDs, hard copies, etc with everything double-ziplock bagged. Then line the whole thing with tin foil. Can't hurt to be overly paranoid, can it?

I just bought a terabyte drive for $100 to back up the other terabyte drive I bought several months ago for $160. Now everything is backed up in multiple. And I can access it without getting online. And I don't have to worry about my cloud storage company going out of business and taking all my data with it.

And if your house burns down, you're screwed.

I want a way to get cheap, fully-automated, redundant, off-site backups.

I want it badly enough, that I'm building a solution myself [github.com], based on the allmydata.org Tahoe distributed file system.

Backups over the typical home user cable modem or ADSL line are guaranteed to be very time-consuming. As a partial solution, my system will do incremental rsync-style deltas (the infrastructure is in place now, but I want to build more confidence in the non-differential

I want it badly enough, that I'm building a solution myself, based on the allmydata.org Tahoe distributed file system.

Forgot to mention that the distributed file system is a "friendnet". All of the data is stored on the hard drives of friends' and family's machines in their homes. It uses Reed-Solomon encoding so even if some of the machines in the friendnet die, I won't lose any files. And all of the shares are encrypted for security. I don't really care about that; the people whose machines I'm storing my data on would be welcome to look at anything they like, but the privacy assurance is in place for those who need

I just bought a terabyte drive for $100 to back up the other terabyte drive I bought several months ago for $160. Now everything is backed up in multiple. And I can access it without getting online. And I don't have to worry about my cloud storage company going out of business and taking all my data with it.

And if your house burns down, you're screwed.

Seems to me that if his house burns down, he's screwed even if his terabyte of pr0n is backed up "in the cloud somewhere."

I just bought a terabyte drive for $100 to back up the other terabyte drive I bought several months ago for $160. Now everything is backed up in multiple. And I can access it without getting online. And I don't have to worry about my cloud storage company going out of business and taking all my data with it.

And if your house burns down, you're screwed.

Seems to me that if his house burns down, he's screwed even if his terabyte of pr0n is backed up "in the cloud somewhere."

Why? He'd just restore it from where it is. Might take a little while, but better than losing it (assuming it's something that matters, not pr0n).

Just buy a few hdds, rotate them out, drop them off at a friends, or if you're really paranoid, a safety deposit box., Cheap, off-site, and better redundancy.

Been there, done that, doesn't work.

Anything that requires manual steps like shuffling drives around probably won't get done, and certainly won't get done very often.

And the redundancy of such a solution would very inferior to what Tahoe provides.

lso, since the backups are hours instead of months, they're actually going to be useful.

Nothing worse than restoring from old data.

That's not an issue with my solution. The backup and upload processes are separated so you can do daily backups

Seems to me that if his house burns down, he's screwed even if his terabyte of pr0n is backed up "in the cloud somewhere."

Why? He'd just restore it from where it is. Might take a little while, but better than losing it (assuming it's something that matters, not pr0n).

He's GOT NO FUCKING HOUSE! How is that *not* screwed?

Or is he going to restore his house "from the cloud?"

The cloud is a dumb idea. It was originally supposed to be everyone's computer, as a distributed system, not some client-server shit that these companies are trying to intermediate themselves into as a substitute for coming up with something better.

In other words, your computer and thousands of others would devote some bandwidth and storage to backing up chunks of each other's data, sharing where appropriate, making available to the wolrd+dog where appropriate. Files that you want backed up would be broken up into redundant little pieces, and distributed among your peers, and in return, you'd do the same for others.

When it comes time to restore, you'd restore from the various chunks out there, and since there's lots of redundancy, and lots of bandwidth (since each box is only contributing a small chunk), restores would be as fast as your downlink.

Instead, the cloud has been taken from its' natural setting by companies who want to be for-profit gate-keepers, even though, by their very nature, they will do a worse job (less redundancy, not geographically spread out, etc.)

The web really should become read/write, like it was supposed to be in its' original design.

The "whole web" is not just S3 - it's your computer, and every other one on the internet. The original plan was to have all computers serve as both clients and servers in a true peer-to-peer network. Unfortunately, most users weren't up to it at the time, connections were intermittent, disk storage was expensive, etc.

The "whole web" is not just S3 - it's your computer, and every other one on the internet. The original plan was to have all computers serve as both clients and servers in a true peer-to-peer network.

The original plan for the web also assumed that everyone was a good person, a scientist working at a major lab or university, and that they had a (relatively) powerful computer on their desk. It's not exactly worked out that way! It's turned out that making all machines fully addressable isn't possible (well, not with IPv4), and that too many machine owners don't have the skills to keep all their systems secure enough to stop malicious people from causing damage. Inevitably, this leads to the current client

In other words, your computer and thousands of others would devote some bandwidth and storage to backing up chunks of each other's data, sharing where appropriate, making available to the wolrd+dog where appropriate. Files that you want backed up would be broken up into redundant little pieces, and distributed among your peers, and in return, you'd do the same for others.

When it comes time to restore, you'd restore from the various chunks out there, and since there's lots of redundancy, and lots of bandwidth (since each box is only contributing a small chunk), restores would be as fast as your downlink.

That is what I'm trying to build. The "thousands of computers" introduces lots of challenges, perhaps the largest ones non-technical. So I'm starting by trying to build tools to make groups of friends and family able to provide these services to one another.

More precisely, the Tahoe project is trying to provide the tools for distributed file systems across small to medium-sized groups of machines. I'm just trying to provide an effective backup solution on top of it.

Guess what else won't work? Your house just got burned down so your data lines are GONE.

Bah. Data lines are easy to find. Hell, nearly every hotel in the country has free Wifi. I buy a laptop, I install some software, I type in my key, and I have instant access to my files.

Better to have physical copies in a safe fire-proof place that is easy to access.

Well, if last month's version of the data is good enough...

Keep in mind that there is no such thing as a fireproof safe. Safes offer varying degrees of fire resistance, rated in terms of time at temperature. Should the temperature be higher, or last longer... you're screwed.

You need to separate yourself (as a/. reader) from the other 99.8% of the population. Backing things up locally is economical, practical, logical, and (here's the kicker) requires some knowledge and dedication.

What is the draw of the online backup service? Do you remember the the chicken roaster that Ron Popeil (sp?) used to sell? It wasn't the machine that made the sale. It was his tagline: "Set it, and forget it!"

Most average people aren't going to set up RAID arrays or Syncback or install add

People living in the year 4000, will know less about us than we know about ancient Rome.
I totally and completely agree. Our civilisation is Atlantis: we will disappear and the neolithic survivors of the coming die off will spin myths about our vaunted abilities.

Yeah, but what if, god forbid, lightning strikes and blows all your electronics, a hurricane, tornado, or tree fall strikes your office, it burns down, etc. etc. I see a benefit to off-site storage, and the easiest way to do that is electronically. You may not want to use a cloud service as your way of creating and accessing business data, but you do want *some* kind of cloud storage.

I see a benefit to off-site storage, and the easiest way to do that is electronically.

The second easiest way is to sync to an external hard drive and stash that in a safe deposit box or other off-site location. This can even be faster and cheaper for large data sets before the artificial restrictions on last-mile bandwidth disappear, and it avoids the problem of a backup provider going out of business.

100% agree....and if it was on the "cloud" you wouldn't have access everywhere. Only where the net access wasn't filtered to disallow it.

Plus forget about companies closing down, you'd be at the mercy of the company that now owns your data anyway. If they decided to hike up their rates before you could remove it all, you'd have two choices. Pay up, or lose your data.

Get a 3rd drive though and store a copy of your data off site, updating periodically (maybe once every month or two, or if something you reall

Cloud computing makes sense for email and for off-site critical backup of your most important files.

But if you do a lot of sophisticated work at your computer, it's best to have locallized storage because access to data is a LOT faster and you don't have to worry about if the Internet suddenly goes down, losing acces to your data "in the cloud."

the author spent three months replicating 1TB of home data via cable modem to an online backup service.

Surely the 100$ the author "saved" by doing that could not have been worth the three months it took? That's about 140 kbps... You could buy yourself a 100$ TB drive and have a local system set to back up and restore your data whenever you need and it won't take 3 months for the data to get there and back. *And* you have control over your data and its security. *And* it would probably be cheaper anyway in the end.

Need to transfer 1TB of data? Mail Amazon the data on a drive, they load it, send you the device back. Sure beats uploading for 3 month with a cable modem. Have more data than that? You can send them up to an 8U drive enclosure, and more than that if you make special arrangements.

And that's a solution that solves the grandparent's problems, specifically that cable modems really aren't that fast - not when compared to enterprise bandwidth. But there's still huge demand for off-site storage for enterprise in the cloud.

At Rackspace Email, we use Amazon S3 for data backup (link to blog [rackspace.com]). Depending on what step you're at in an email's life, and whether or not you count raid, we've got between "a few" and "a bunch" of copies of your email in our datacenters; but just in case, we also sh

Well...maybe. As a consumer, I don't care if it takes a few days to get my data back.

If my house burns down and I lose a terabyte of pr0n, I'll have enough other problems to worry about while I wait for a download to finish up or for a metaphorical station wagon full of tapes to arrive.

Meanwhile, though, S3's storage is pretty expensive for that sort of data on a consumer level, at $150 per month for 1TB of storage. For those prices, on any sort of lengthy term, I can easily justify the time and expense o

Meanwhile, though, S3's storage is pretty expensive for that sort of data on a consumer level, at $150 per month for 1TB of storage. For those prices, on any sort of lengthy term, I can easily justify the time and expense of putting together my own network backup solution (parking a cheap NAS box over at a friend's house, for instance), and still have enough cash left over to build a second one so that the same friend can back his stuff up to a NAS box at my house.

So don't buy storage space on S3. Simple. End of story.

Of course, if you're making use of more of S3's functionality (e.g., the data's online and so accessible from anywhere) then the price starts to look a lot better, and the fact that its a replicated geographically-distributed data store so you don't have a huge worry about the data becoming inaccessible when Bad Things Happen... that's when it goes from looking expensive to cheap and easy. But not everyone needs that, and it is up to you to make your ow

So I've gone to the trouble of putting my business data on a drive ready to ship to Amazon. AKA it's a backup now.

Why don't I just ship the drive to a physical storage facility that I pay next to nothing for? Why do I need this in the cloud?

If my "shop" burns down, aka local disaster, I've got issues that are going to be higher priority than having access to my 3 week old data now. Cause there is zero chance Amazon's cloud is going to have an updated copy of my hard drive on-line over night.

Boeing and Airbus are the worlds largest suppliers of cloud computing and have proven to be very reliable. Crashes are infrequent and while they can be disasterous for those directly involved they are a very small fraction of all customers. Generally replacements are on line the next day.

Granted, if one has in production a data store of 1PB, and is relying on cloud storage as the backup medium, a restore of that 1PB of data will take a frightfully long time in a DR scenario. Not that there aren't many, many shops with that much data (and more) in use every day, but I'd suggest that they are the exception. I know we are. We deal with less than a TB in live production data, at most. Much of that we could live without while it is restored, because our architecture is designed with that filer-I

You don't have to choose one or the other. I don't understand why so many presumably smart people here (well, ok...) pick on a problem of some backup method or other and then conclude that it is therefore not a choice. If you really care, you have multiple backup methods - not just multiple copies, but multiple methods. They then compensate for each other's weaknesses.

Well, security issues can be another matter, as having multiple methods doesn't help your security if one of them "leaks". But I'm talking ab

If the data is processed and lives in the cloud then bandwidth is no longer a major issue. As an example:

In one world you could have the Exchange servers backups pushed out to a cloud provider. This would result in many hours to get the data out there, and the challenge of restoring it in the event of a problem. As the OP indicated.

Or...

Push the Exchange server and it's data into a "cloud" provider. Now the clients access the data from the Exchange server in the "cloud" and the "cloud" provider provides DR

Really? How did I end up with an S3 account then? I guess Jeff must have slipped something in my coke when I wasn't looking.

My current backup there costs me about 10 cents per month. It includes almost everything I did in college, as well as my current programming and other projects, sans the final renders of stuff. I'm planning to go through my photo collection to pick up the good ones (burst mode is great, but results in many more mostly redundant photos to wade through) and when I upload those, I'm still

"the author spent three months replicating 1TB of home data via cable modem to an online backup service."

What a waste of time and effort. There's a simpler way, but it depends on your provider.

All the author had to do was to set up DRBD on his VM. DRBD supports "truck mode" (as in never underestimate the bandwidth of a truck full of tapes - or USB keys, for you young ones).

Just have the cloud provider set up a USB key, and sync it up with DRBD. Then have the cloud provider Fed-Ex the USB key. Amazon will do this; I don't know about other providers. Once you have the USB key, just sync it back up with DRBD.

I absolutely amazes me of all the bright people who are using cloud services (including PhD's doing research) overlook this simple method.

Save your bandwidth for the updates. Do the heavy lifting with the tools that are out there.

That's not a bad idea in itself, but I can imagine some employers having suspicions about an employee shuttling 2TB drives in and out of their workplace. The term "corporate espionage" springs to mind.

We have our backups offsite too. On externally hosted servers that we directly control in a heavily security vetted DC (some of our clients are banks who would demand nothing less even though the backups in question contain non of their operation data aside from emails containing project/spec/contract documents and such) rather than a "cloud" arrangement, but it would still take quite some time to draw the whole lot down over the connection we currently have.

How long will your choice of media last even in the pitch dark ? If the data is really worth backing up it is worth restoring from on a regular basis to check for validity. Even long term storage media like mainframe tape only warrants for 12 months and then you need a re-write to 'clean' media. Most businesses satisfy themselves with a 'best-effort' and then just live with the loss. Only those places mandated by strict law or those with a huge potential financial loss ever really deal with the situation.

It was an email special. When drive space is that cheap you can have complete redundant backups and store one off site.

I don't see a problem with using a cloud storage provider for redundant off site backup. At least you'd have the data, even if it took a week to restore. If you could prioritize the restore, important and active customers first, everything else later, that might not be all that bad.

For home backup, i can't see anything, being better than keeping a
spare hard drive somewhere. You can even get USB plugable box
so no excusses for lamers. If you COLOing or have a dedicated server
the question is, do you pay for a backup box at your hosting provider
or do you backup to another remote location. For COLOs you've got
a lot more bandwidth than a home user. My 4GB/s provider, means
that the example 1TB restore would only take about 40 minutes, which is easy.
And if the backup storage is $10 pe

They charge $2.50/GB if you go over your monthly transfer limit. If I lost my data and needed to replace it quickly (assuming I for some reason chose to back up multimedia in the cloud and then suddenly needed all my DVDs at once) it would cost considerably more than buying a highly redundant RAID array.

to anyone who can out code MS...
From modem using UFO hunters to Russians with adsl, to grandmas with FTTH.
MS failed with your desktop, failed with the net, failed in London, and now you want to trust them with your personal data on the net ???

I just signed up for Mozy for a measly $54/year. I have almost 9GB of data backed up to their servers that took about a week to completely upload from my laptop when I was occasionally connected to the internet and not using it. I have a very small consulting business and I don't have time to juggle hard drives, run to the bank to keep a secure offsite backup or spend time worrying about my data.

If I don't pay my bill, the data does disappear. So What? I probably moved to a different service or a local back

1) Speed of recovery. You have instantaneous access to data backed-up to the cloud. Getting access to your securely-stored hard drive will take longer.

2) Ability to backup-and-forget. Backups to the cloud can be done automatically. You need to physically make and transport manual backups. This is tedious, uninteresting work. People hate doing this kind of thing, so they typically stop trying after a time.

Anybody who accepts and uses the term "cloud" is past redemption. You see there is this property of clouds that applies neatly to the current meaning. Clouds quite often dissipate naturally and quickly leaving nothing but clear blue sky. Not somewhere to keep things safe IMHO.

When Microsoft loses the personal data of its Danger customers, it's not more an indictment of cloud storage than Microsoft Windows blue-screening and losing your data is an indictment of desktop computing.

Point to the server room, tell them "we've got our own cloud in there and we've already paid for it". Then start talking about the expense of change. They can then go back to their own pointier haired boss and tell them that your company is leading in the cloud and fully buzzword compliant.It's funny, a few days ago I was pondering how best to speed things up for least expense with solid state disks when a clueless salesperson rang to try to sell me some cloud stuff over a horribly slow link instead which

The reasons the author of the article is giving to avoid cloud computing. It isn't you can buy a 1TB drive cheaply, like everyone seems to be discussing.

That cloud computing is like AJAX - a good way to bash new stuff without thinking too much and putting it down as a hype, giving one thousand reasons what it does and why it is wrong, and not mentioning its true function once. May I NOT jump on your bandwagon? I don't even use cloud computing because I don

It seems that every customer I run into with a glitchy backup environment wants to do "online" backup because it requires less investment and the presumption that their data is "safer" offsite. Our occasionally braindead sales often jumps on this bandwagon and I get the virtual equivalent of kicks under the table when I ask about versioning, disaster recovery, data formats, on-site data delivery (ie, all data at once), Active Directory, Exchange, SQL, and metadata recovery. I don't even get into security.

Anything above the Economy package has WAY more than enough bandwidth.

Storage networking assumes a symmetric bandwidth pipe. One half of that symmetric pipe uses the bandwidth listed as the maximum possible upload speed -- the number after the/. For cloud based storage to work for a large portion of the connected systems, the Ignorant Lame Egotistical Carriers have to provide significant symmetric bandwidth at an affordable price. I don't see anything symmetric or affordable in what you listed.