An IT Pro Rant – Azure Storage GRS is NOT VM Disaster Recovery

Here’s a topic that I encounter a lot with people that are tasked with designing and implementing disaster recovery in Azure, but don’t actually understand the Cloud model or environment.

Scenario

A customer wants to design disaster recovery for their Virtual Machines that are running in Azure. They assume because they have chosen to utilize Geo-Replication Storage (GRS) accounts, that they can “simply” utilize the VHDs from the geo-replicated storage to build the Disaster Recovery (DR) version of their Virtual Machines.

Azure Storage

With Azure Storage, we have multiple replication options to choose from. Currently, we have the following options available to us:

Locally-Redundant Storage (LRS)

Zone-Redundant Storage (ZRS)

Geo-Redundant Storage (GRS)

Read-Access Geo-Redundant Storage (RA-GRS)

Since this scenario is focusing on the geo-redundant option, that’s what we’ll focus on here.

It is important to note that the Geo-Redundant Storage (GRS) option is designed specifically for “Cross-regional replication to protect against region-wide unavailability.“

This means, that if something were to happen to the Microsoft Azure datacenter (i.e. a natural disaster), and you (the owner of said data / Storage Account) have chosen to use Geo-Redundant Storage (GRS), “that data is available to be read only if Microsoft initiates a failover from the primary to secondary region.” So, if Microsoft cannot recover the original primary datacenter, THEY will initiate a geo-failover and then, and ONLY THEN, will the replica copy of your Azure Storage account become read-write enabled.

Even if you use the Read-Access Geo-Redundant Storage (RA-GRS) option for your Azure Storage account, this does not mean the replicated copy of the data is writeable.

Here are some important links that describe the Azure Storage replication options, and what to expect when a region-wide outage occurs:

VM Disks

When you create an Azure Virtual Machine, the VHD files are stored as Page Blobs in the target Azure Storage account. These disks are designed for 99.999% availability.

Now, there is a difference between Managed and Unmanaged Disks. In this example, it is assumed that Unmanaged Disks are being used, since with Managed Disks you don’t need to manage and maintain the underlying Storage account.

Unmanaged Disks

With Unmanaged Disks, we are responsible to create our own Azure Storage account(s) and specify which Storage account to use when we create the Virtual Machine.

The unseen challenge with this is that you need to ensure that you don’t place too many VHD disks into the same Storage Account, because you could exceed the IOPS limit and the performance on the VMs would be affected. You basically have to figure out, on your own, the optimal number of VHD disks per Storage account, based on the performance needed for your Virtual Machines. In a large environment that can be very challenging to do.

Managed Disks

Just so that we cover both aspects, with Managed Disks we don’t need to worry about balancing and managing the Storage account. That’s done in the background for us. That means we could create 1,000 Virtual Machines and not have to worry “am I placing too many VHDs in this Storage account”?

Conclusion

So, the moral of this story is, you cannot use the Azure Storage replication options to facilitate a full disaster recovery of your Azure Virtual Machines. This is because the replicated Storage account is not readable unless Microsoft enables that due to an entire region-wide failure.

The correct solution for Azure Virtual Machine disaster recovery is Azure Site Recovery (ASR). I’ve written a lot of articles on ASR, which you can check out here.