You've heard of 'trust but verify', right? Well, remember 'trust but protect' when mulling building a hybrid cloud

You broke it, you pay for it, one way or another

Comment Trust the hybrid cloud, service providers tell us – they are, apparently, the experts. But when outages occur, and when data or virtual instances are lost or become unavailable, the impact is profound.

Few businesses can trade for any length of time without access to their data or infrastructure.

Trust the hybrid cloud? Trust but protect, I say.

One of the big mistakes while implementing your own hybrid cloud system – particularly with backup and recovery setups on and off premises – is to assume this stuff is easy. Yes, it’s not stupidly hard to get a cloud world running, but that doesn’t mean you don’t have to bother designing it properly. As we’ve said before, it’s a breeze to configure your AWS storage incorrectly and leave your private files facing the open internet. It’s a mistake, therefore, to assume while storage is available, and backup and recovery features are provided with your cloud setup, that you’re covered to the degree you need without doing more than clicking a few check boxes.

Backup and recovery are more important now than at any time in enterprise IT infrastructure. We’ve gone from a world of known knowns – you run your own servers behind the relative safety of a firewall and using the software patching and security regime of your IT team – to the unknowns of cloud. We’re talking somebody else’s servers, patching and procedures, with data and applications crisscrossing networks and with some services no longer even physical but virtual.

Lose all or part of that, and you’re in trouble.

You need good data protection, backup and recovery, in place for those moments data or virtual machines get lost or corrupted, users make mistakes, a server or disk crashes, or you get hacked or hit by ransomware.

Back to basics

Before you can install any of that, though, it’s important to understand the fundamentals first – what are your objectives and what are your recovery systems capable of? Too many folks don’t think about these. Requirements? We need to back stuff up and be able to restore it. Simple, right?

Wrong. The fundamental starting point is to define your restore point and restore time objectives, and to work out what capabilities and performance you need from a bunch of “what if” scenarios. Here’s a practical illustration: how long would it take to restore the half-million customer files on your server, or that of your cloud service provider, if it just got seized and is now being held for ransom? Thinking hard about requirements may seem old-fashioned in an era of agile, iterative development, and of fail fast, but you can’t build an answer if you don’t know the question.

Pick your backup strategy

Having considered your objectives in detail, which I’m going to leave to you, it’s time to start thinking about the kind of system you want for hybrid cloud backup and recovery.

Replication has emerged as a popular option in today’s discussion of backup and recovery precisely thanks to the growth of cloud. The thinking runs like this: you let the infrastructure do all the hard work of maintaining a real-time (or, realistically, a near-real-time) replica of the data. In the event of a problem, you rely on data that’s been synchronized and held at a secondary site.

GitHub lost a network link for 43 seconds, went TITSUP for a day

It’s no surprise that replication emerged as such, given that most cloud services either provide replication for you by default or make replication a relative doddle to configure.

And replication is a really, really good thing. Both on-premises and cloud-based storage can be replicated in near-real time between systems – even if you’re replicating cloud data between regions or across a hybrid setup between the cloud and your local arrays. So, retaining duplicates of your data to protect against individual kit or site failures is a fairly idiot-proof approach.

The benefits of replication are therefore great. But get hit with malware, and you’ll end up with replicated copies of infected or scrambled files. Similarly, if an application or a virtual machine starts misbehaving, or if someone in IT suffers from fat-finger – such as typing “DROP DATABASE” into the command line or prompt – then you’re similarly stuffed, as all these problems will be, yes, replicated.

One way around all of these is, of course, to consider point-in-time snapshots of the data stores, at the hypervisor level, in the cloud, or using on-premises virtualised systems, or you can work at the SAN level if you have access to that infrastructure – which means you are probably hosting the kit yourself. The aim being to take snapshots of known good data.

Here, duplication is the order of the day so that you’re not using zillibytes of storage, and they’re easy and simple to use. But if you can detect I’m approaching a downside, here it is: in many cases you can only revert the whole filestore to a previous version, not pick a file out from a particular date and time.

Back me up, Scotty

Backup, like replication, does what its name implies only – in this case – data is stored until it’s called upon according to some Service Level Agreement that’s triggered by some event like a data center hardware failure, a user error, a ransomware outbreak, or some other incident.

Backups can be very basic and traditional: full yearly, monthly, and weekly with incrementals taken one or more times per day. Those incrementals will be a log of changes to your data since the last full version was archived – which means you only need two types of backup media to do a restore: one containing your latest full backup, and one holding all the incremental changes since. The last full version can be combined with the log of changes to rebuild your file systems and databases to the last incremental.

GitLab.com melts down after wrong directory deleted, backups fail

This particular approach is handy in the world of a hybrid that spans public cloud and on-premises equipment. That’s because only your first ever backup will be a full copy with the rest being smaller updates, an approach that means you don’t have to endure the expense and effort uploading a big backup from site to site lots and lots of times based on whatever schedule you’ve settled on. It’s a case of one big one and done. Or should be.

However, and it’s an important “however”: when the time comes, you won’t want an eons-long restoration process plodding through a full backup that spans thousands of differences. Your backup system should be one that gives you the ability to define “synthetic” full backups: where full and incremental copies are combined as one, in order to provide a single virtual entity that’s much faster to restore from.

In other words, don't just take a full archive once, and rely on rebuilding from incremental changes: one bad incremental backup, and you could start losing data. Instead, find a happy medium with regular virtual or physical full backups, and increments to bring you back to the latest day, hour, or minute, depending on how your business runs.

De-dupe this

This brings us to de-duplication. If you have storage somewhere in your infrastructure – whether it’s a low-level SAN or the storage your backup system writes – you’ve got de-duplication technology, too. Deduplication shrinks the size of backup files to, among other things, help reduce network traffic during a recovery procedure; it’s not uncommon for de-duping to reduce data volumes of copies of data by more than 90 per cent. It works by not keeping, say, 10 instances of the same chunk of data but, rather, one copy and nine small “shortcut”-style pointers to it.

Synthetic backup is only made possible in the hybrid world with deduplication. Cloud is a world where you’re paying for every byte of network transfer and storage, which means you need to watch what you are moving around the network and what your putting in storage buckets. Deduplication can also cut back up and recovery times, especially in the wide-area world of hybrid. It is critical when you’re flinging data between your installations to provide multiple copies and to protect it.

Reliable, to a point…

Technology is really reliable today. The risk assessment on corporate data comes out with an increasingly lower and lower score. Your IT kit is more reliable than ever before – for example, SSDs have no moving parts and, so, while not invulnerable to bit errors, at least will never physically seize up like a HDD – while resilience has ceased to be rocket science.

In part you can thank cloud, as tech vendors want you to rely on their infrastructure, and not employ your own. But cloud has also given backup and recovery new significance.

Not only can “they” – your service provider – mess things up in a data center over which you have no control, but hackers in a hybrid world can infiltrate your servers, networks and storage on and off-premises, wreak havoc, and cover their tracks – blowing away the replicas and snapshots you thought you could rely on.

If you architect “easy” or take the default settings in hybrid, you stand a strong chance of failing to protect information and in your duty as the person responsible for ensuring the backup and recovery fabric.

Fortunately, backup and recovery for hybrid isn’t an insurmountable challenge: it just requires a firm grasp of the basics and an understanding of your goals and outcomes – what data or virtual instances to pull back, how quickly and from what period in time among many other factors.