Building a Live Mount Across New and Old Backups with Rubrik

Chris Wahl · Posted on2017-10-162020-05-05

The idea of bringing back data quickly is attractive when dealing with a data protection solution. Unfortunately, most of the conversations I’ve had in the past dealt with ingest speeds and storage nerd knobs. While it’s important to be able to protect applications at a frequency that meets the desired Recovery Point Objective (RPO), I’d wager that finitely knowing how quickly data and applications can be returned to an operational state or spun up for answering “what if” questions are top of mind for many folks.

In today’s post, I’ll dig in a bit deeper into Rubrik’s instant recovery capabilities by talking about the Live Mount feature. In a nutshell, this allows for a full virtual machine to be restored from a backup (we call it a snapshot) in a matter of seconds. This can be done across any number of first full / incremental forever snapshots without concern for how many backups exist. Most of the customers I work with leverage this for regression testing, application upgrade validation, service schema or configuration testing, and disaster recovery models. The sort of “what if” scenarios that many of us would like to tinker with without applying any pressure (in terms of both capacity or performance) on the production storage array(s).

To demonstrate this, I’ll use my trusty “SE-CWAHL-WIN” virtual machine that runs on Windows Server 2012 R2. This workload has been running in the Rubrik engineering lab for years. My, how time flies! I’m not much of a GUI person, so I’m going to grab information via PowerShell. It also makes measuring the time between requesting a Live Mount and seeing it completed a bit easier. For fairness, though, I’ll include a few screenshots and short animations from the GUI. 🙂

Gathering Snapshot Information

After loading the Rubrik PowerShell module and connecting to one of my engineering clusters, I’ll begin by pulling down information on the snapshots and storing the results to $Snapshot. After that, I’ll display some of the results.

This takes only a moment. To avoid making this post super long, a snipped list of snapshot details are below. Data for the past month is held locally on the cluster, while data beyond 30 days has begun to transition into the public cloud as per the SLA Policy assigned to my virtual machine.

Live Mount Validation

For fun, the results can be viewed in the Live Mounts section of the GUI. It’s worth noting that once the status transitions to “Mounting…” it means that the immutable image of the backup has been exposed by Rubrik via NFS to the target ESXi host(s). Since I didn’t specify a host, Rubrik chooses one on my behalf.

The remaining time is spent waiting for the vSphere environment to add the virtual machine to inventory. The Live Mounts complete at roughly the same time, which is groovy considering I’m using snapshots from today, a week ago, and a month ago. The net result is that Rubrik’s Atlas file system isn’t impacted by how long it has been since a backup was taken. Otherwise, it would be a poor design. 🙂

The vSphere Perspective on Live Mounts

For further verification, the vSphere HTML5 interface shows the original virtual machine along with 3 Live Mounts.

Let’s pick on the oldest snapshot from September. I requested this particular Live Mount at 2017-10-13T19:12:30Z (GMT) which is 12:12:30 Pacific (my local time). Here’s the request reply to showcase the start time.

The bottom of the vSphere log shows the “Creating VM on host” event at 12:12:31. This is one second later. One second!

The remaining 8 seconds is spent adding the virtual machine to inventory. Considering that this particular snapshot was taken 40 backups ago, I don’t see how we could make it any faster than one second.

Thoughts

The ability to bring back data and applications in a quick manner should be of paramount importance to anyone responsible for service deliver within an organization. This is especially worth digging into when considering the amount of data that is being ingested and the technology a vendor leverages to maintain that data, such as legacy snapshot “chains” that require rebuilding the master image versus an intelligent and content-aware file system designed to make data available both globally and instantly.

While this post covers Live Mounts, which are a form of clone that is built using backup data to solve what-if use cases, there’s another feature named Instant Recovery that is targeted as recovering in the face of failure. The main difference is that Live Mount presents a clone of the original workload with the network disconnected. Instant Recovery puts the workload back into its original working order – same vSphere and network personality, active network connection, and so forth – with an optional Storage vMotion towards the end to place the workload back onto the production storage array. Both options are based on the same underlying technology when it comes to making snapshot data available in a quick and efficient manner.