Posted
by
Roblimo
on Tuesday February 18, 2014 @04:32PM
from the prepare-for-the-worst-days-and-the-best-days-will-take-care-of-themselves dept.

This is a conversation with Jeff Whitehead and Lou Montulli, respectively Vice President of Technical Operations/CTO and Chief Scientist for Zetta.net, a company that specializes in online backup and disaster recovery service. Also, while this interview was arranged without his help, in the interest of full disclosure we'd like to tell you that Zetta's CEO is Ali Jenab, who used to be CEO of Slashdot's parent company. But this discussion isn't about Ali or Zetta.net, but about data backup, and what methods are best and most cost-effective for companies ranging from home-based businesses up to enterprise operations with thousands of employees. Among other things, we discussed the importance of multiple-site storage for important data, a factor that was drilled in to us yesterday by an article titled Another Iron Mountain Fire Points Up Shortcomings of Physical Storage by long-time tech journalist Sharon Fisher. And never forget: You don't know how effective your backup and data storage arrangements are until you try to retrieve your data -- and if you don't try to retrieve data until you need it, and things don't work, you are in big trouble. (Don't see the video? Here's a link.)

Robin
Miller:I am Robin Miller for Slashdot. And looking at Lou
and Jeff and the titling tells you more about them. They work for a
company called Zetta. And we’re talking about, what you do when
you backup and how should you backup and the difference between
archiving and smart backups for things you need right away. So let’s
start with Lou. Lou, just give us some idea of what different sized
businesses you might think about as far as backup for instance?

Lou
Montulli:So we generally look at in three different size
business segments. We got the very small SOHO businesses, might be a
drycleaner, anywhere up to several tens of people who don’t
have that much data. And you’ve got some medium size folks who
might have between 2 and 50 terabytes of data and range anywhere from
as small as 10 people up to a few hundred people. And then you have
kind of the large enterprise folks which are hundreds of terabytes in
general and range from hundreds of people up to tens of thousands of
people.

And the needs
for each of these companies is usually defined by how much data that
they have, because when you have more data, the kinds of problems
that you have dealing with the data size and how to offsite it are
different. In very small companies, there are lots of tools
available, anywhere from just: buy a USB drive, take it home with you
up to very small tape systems. The medium size business is what we
tend to address which is the 2 to 50 terabyte range and we feel
that’s a perfect size to employ Internet based backup.

It’s
small enough that it can travel over the wire efficiently and it’s
big enough that it’s really important data, not that any data
isn’t important but it’s big enough that the problems of
backing it up are actually reasonably substantial. So you want a real
company that understands enterprise IT helping you do it. And the
other segment which we don’t address, which is the very large
enterprise tends to deal with multiple datacenters, have massive
robotic tape libraries and/or massive on-disk backup and other highly
complex systems.

Robin
Miller:Okay. Jeff, so you just realized with your small but
growing business that you have to do some data backup or else, I live
in Florida and we haven’t had a hurricane hit us for a while
but one could any time, so what should I do with my small but growing
business as far as data backup?

Jeff
Whitehead:Basically what you are describing is a geographic
risk that’s specific to Florida and so what you would like to
do is make sure that your data is offsite. So that if a disaster
occurs in one location, it’s very unlikely to happen in another
location, like fires can happen anywhere but it’s very unlikely
that two or perhaps three depending on how many times you make copies
of your data based on the sensitivity or burn down all on the same
day, that’s just not going to happen.

Robin
Miller:What about the difference between data you need now
and archive data? Lou, what about the difference, do we store them
differently?

Lou
Montulli:Well, that’s a great question. I would say
that all data is important and you never want to lose any of your
data. Obviously if you put it in the archive, there is a reason why
you’re keeping it. So it’s not really a case where you’d
say I want to increase my risk or archive data, but generally what
you want to do is say, I am willing to take a penalty in terms of
access speed in order to gain a better price in your archived data.

So you can
look at different types of media or different types of lower
performance spinning disk to gain advantages and price on archived
data. But I definitely don’t recommend that people ever take a
chance in terms of data integrity on any of their data. That should
never be something you sacrifice.

Jeff
Whitehead:I think there really is two different kinds of
archive data. One is where you have the data and could possibly
reconstruct it, say off of a tape drive, or off of disk for computers
that are sort of spread out or data that’s been crypt down and
you could re-crypt that in some way and archived data where you
transfer it of some place because that was the only copy of data and
it’s got to be really protected and stay there forever.

Robin
Miller:Okay, yeah, I was thinking actually in terms
of____4:54. I haven’t had an active business for
some years, but my wife sold art, and we still have credit card
receipts which were supposed to hold for seven years, and we have
bank safe deposits. That’s all you need I think for a small
business. What’s the next stage in the electronic stage beyond
that?

Lou
Montulli:Well, lot of people are scanning them now and
putting them into some sort of either database or just putting them
on a file system, you are off-siting them somewhere and the type of
data you’re talking about is actually really important to be
able to get to that data and find it and search it because it becomes
a needle-in-the-haystack problem, but the most important thing is
that you have at least one or more copies of it somewhere and being
able to get to it when you need to.

Robin
Miller:Okay. So how do we search for it?

Lou
Montulli:Well, searching is a complicated thing. It’s
very much dependent on – it’s a good question for Google.
It very much depends on the media type, right, obviously it’s
very difficult to searching photos, but it’s really easy to
search within a text document. So I think it’s very much
depends on your particular application type and that you make a
decision based on the type of data.

Robin
Miller:So we get up into terabytes, yottabytes, and
zettabytes, so how do we search that?

Jeff
Whitehead:It’s tricky, I think that in many cases,
it’s sort of an old tradition of the data I’m looking for
was on Server 23 and it was on the C-drive/My Documents, and people
kind of have a recollection of that. If you’re a very small
business, it’s often fairly simple, you know that your credit
card receipts were in a given folder and go and look in that place.
If you’ve got a very large data set that all looks the same,
then you really need some sort of specialized application that will
give you an index and someway of searching those things, like a great
example for photos is the Picasas of the world or the Windows’
thumbnails, so you have a way of looking through those things.

Robin
Miller:What else should I know that I don’t know?

Jeff
Whitehead:Not all backups are of the same quality. I like
to tell people a backup isn’t a backup until you have restored
it. So, I personally have run into tapes that I thought were good and
have then corrupted or trying to restore a system that relied on a
particular type of a RAID card being in the system____7:29after
you restored it. So really you need to think about what could happen
for my data. If it’s a Word document, there’s not a whole
lot of risk there. You can get versions of Word that go way back and
open that up. If it’s an application, there is the whole –
all the pieces of the applications stack you need to protect so you
can restore and reinstall.

Lou
Montulli:I had a few items as well. I think especially in
the world of Internet-based backups, there are specific problems
related to going over the Internet that are not there if you’re
backing over the LAN within your enterprise. And they get harder and
harder as the data sizes get bigger. One of them is just the
reliability and security of the internet, so making sure that you
choose a vendor that is well versed in security and is always using
encryption technologies to make sure that the un-trusted WAN
connections are always encrypted.

The other one
is the ability to get large amounts of data over the Internet. It’s
still a very difficult process especially when dealing with terabytes
and petabytes. It’s a particular question that we focused a lot
of our time on it and how to make high bandwidth connections really
efficient for very large file transfers. And then one other item is
not transferring all your data all the time, so in LAN-based backups,
it’s common to take full copies of your server everyday or at
least every week and maybe do incrementals in between.

If you’re
doing full copies of your entire dataset over the Internet, you’d
quickly find that you need massive amounts of bandwidth. So, having
an “Incremental forever” technology is really, really
important and then the reverse of that if you are “Incremental
forever”, how long will your restores takes. So we always
recommend “Incremental forever” with reverse incremental
technology, so you also have full backups available for restores and
quick restores.

Part of any
good data integrity strategy and disaster coverage strategy is one,
getting your data off-site, get it out of your enterprise, and two,
making sure it’s far enough away from where you are looking and
so that any particular regional disaster zone is not going to affect
all of your copies of data.

Initially if
you really want full data protection such that you can sleep at night
and never have to worry about data losses, we recommend that
customers not just make one off-site copy, but make multiple off-site
copies, either across the country or in very different regional
zones. And of course each of those copies ought to have fantastic
data protection such that you’re not worried about standard
failures like single disk failures or network failures or other
things like that causing the entire copy to get corrupted because if
you go down to a single copy, then you are again not sleeping great
at night.

So the entire
chain ought to be you are getting your data backed up least every day
if not more common than that, it’s off-site, it’s in
multiple locations and it’s with data provider that’s
providing absolutely top-tier data protection and data integrity
along with all the other security and ease-of-use concerns that go
along with it, because we could talk about an entirely other subject
which is backups have been notoriously difficult to keep running on a
regular basis.

Robin
Miller:Lets

Lou
Montulli:Jeff, you want to take that one or you want me to

Jeff
Whitehead:Sure. Backup reliability is another case where
different solutions have wildly different performance
characteristics. And for a lot of IT administrators, backups is a job
they really just don’t like, they get up in the morning, they
look at their backup status, and there is 70 exceptions they got to
run down and meanwhile the pager starts going off and the printer
won’t print and the CEO needs his new laptop, and so backups
tend to be sort of at the bottom of the pile because they’re
important, but they’re not pressing in the same way as a
customer darkening your doorway and needing some help or a solution
right now.

So really it
is important and it’s tricky to get this data ahead of time
because everyone says, yes, our backups are great, never having
issues, but it’s really important to talk to the user
communities of an existing product and find out what is your daily
life driving this backup solution like. And it’s got to be
like-for-like in terms of hardware and software and the whole
solution because what may work for someone that’s got a very
high-end data center and a high performance storage or a network may
not work for someone with a windows, small business server, it’s
got a entirely different set of characteristics with it.

Lou
Montulli:Cloud, I love the cloud. The cloud has the
potential to make everyone’s job easier, and make things
cheaper and better, now that’s potential, not every cloud
provider actually delivers. The potential there is that you can have
a complete end-to-end service that actually makes your life as an IT
administrator actually easier, because they are either single vendor
or multiple vendor integrated solutions that solve one thing very
well, they have a support staff behind them and ideally they work
virtually all the time without any problem, and when you do have a
problem you have one number to call, and they can solve it because
they’re an end-to-end service. So it’s like having the
world’s best expert hired on to your team just managing your
particular system. And that’s what you should look for in a
cloud vendor is, the absolute best in that particular space
specialized to do what you want it to do and when that works, and
always try it before you buy it.

Robin
Miller:What about backing up your cloud, or is it
inherently backed up?

Lou
Montulli:That depends on your cloud vendor. So many vendors
provide their own backup solution, whether it
be____13:32specialized within their own type of
cloud or they layer on another kind of backup product. Now some
customers do choose to not trust in a single vendor and layer on an
additional layer and back it up to another cloud or bring it back
into their enterprise, that’s kind of a popular thing to do and
say, hey, I’m going to trust this cloud vendor to do this one
thing for me, but I also always want to have a copy within my
enterprise if anything happens to that vendor or if I just want to
move my data out of____14:04different vendors, so
having it within your own enterprise can be useful.

Jeff
Whitehead:I think there is a distinction between the
application as a service vendors like the Salesforces, Office 365s,
the Google Apps of the world, people tend to not want to back those
up, infrastructure of the service like Amazon EC2, Microsoft Azure,
the Rackspace offerings, I think that those you need to backup and
I’d also go out on a limb and say that I think that while the
application-as-a-service providers are a new and better thing and
have made IT easier and better for everyone, the
infrastructure-as-a-service guys are a little bit newer and I think
they’re not quite to where straight hosting was or is today.
And so it’s really hosting with dynamic characteristics that
are bolted on to it, but you still have to do all the things you do a
traditional hosting, which includes____15:00taking
your own backups hopefully through different providers.

Robin
Miller:I have learned personally, I won’t say the
hard way, but I do know that even when you are using a “cloud
service provider,” you should have some backups, there is one,
I will not use their name, it starts with G, ‘Giggle’ or
something and last week I was conducting an interview just like this
on their hangout service and it stopped, now since it’s hangout
service that meant that some of my information for writing a story on
Google Drive was not accessible to me, I’m just one little
freelance writer in Florida cursing them. How many millions of people
were shut-off, so yes I had on a USB hard drive, I had a copy of the
story I was working on. So, should we not even with Cloud have a
backup from our stuff on Salesforce or whatever?

Jeff
Whitehead:Well, I think that’s a good example and
with a – again it depends on the application. With a Word
document or an article you’re writing, you can open it in some
kind of editor. If you did have a backup of Salesforce, I’d
sort of question what would you do with it, do you have a way of
standing up your own Salesforce stack? I think most people don’t.

Robin
Miller:So basically, if I use the application service
provider, I’m placing full faith and trust in them?

Jeff
Whitehead:That’s true.

Robin
Miller:That’s a good thing to talk about another
time, right now that’s I think enough food for thought, do you
have anything you’d like throw in here, Lou?

Lou
Montulli:Yes I do. One more topic would be to appliance or
not to appliance?

Robin
Miller:Okay.

Lou
Montulli:We find some customers having in their mind that
they want an appliance and a lot of customers come to the door saying
I don’t want any appliance and it’s an interesting
question. And often it comes down to pure functionality. If you had
the choice and you could do the same thing with or without an
appliance, I think most of us would choose not to buy the appliance
because if the functionality is the same, why would you want to
manage yet another server in your infrastructure and why would you
pay for that hardware if you don’t have to. There are a few
cases where an appliance makes life easier, but it’s great if
you can have a service infrastructure provider who can do it all
entirely in software. It just makes the process of upgrading easier,
it makes the process of handling multiple office deployments a lot
easier and it removes one more device from your data centers.

Robin
Miller:You guys, could you provide an appliance if I
wanted?

Lou
Montulli:We could provide an appliance like experience, but
we don’t sell any appliances, so we have a full software stack
and if you want to run something that looks like an appliance, like a
backup appliance and export your data to that appliance, then we work
quite well in that environment, but we don’t require nor do we
sell any specific hardware that is labeled as an appliance.

Robin
Miller:Could we not in fact get a generic server and then
get a plaque that says appliance and put it on the front?

Lou
Montulli:Exactly, which is exactly what a lot of folks do
is, they are buying a generic server and throwing their software in
it and then marking it up 5x and selling it to you. We consider
ourselves more of a pure software play in our service infrastructure
and we find the convenience there is tremendous and we can bring
customers on the same day. We don’t have to wait for an
appliance to be delivered and installed.

Rob, lots of genuine, honest respect here. But with the dice acquisition and beta debacle, a lot of effort needs to be made by the editors here to avoid any appearance of using the readers as targeted customers. This interview doesn't help in that regard.