How Microsoft Can Improve Hyper-V Checkpoints

One of the major new features that Microsoft introduced in the Windows Server 2016 version of Hyper-V was production checkpoints. For those who may not be familiar with the feature, Production Checkpoints added a critical capability to Hyper-V's existing checkpoint feature: application awareness.

Prior to the release of Windows Server 2016 and Windows 10, Hyper-V checkpoints were based on a simple save state mechanism. When an administrator creates a checkpoint, write operations for the virtual machine are redirected to a differencing disk. If the virtual machine is running at the time that the checkpoint is created, then the contents of the virtual machine's memory are written to a .BIN file. Hyper-V also creates a state file with a .VSV extension.

Production checkpoints are different from the checkpoints that were previously created by Hyper-V. Rather than producing a simple saved state as Hyper-V has done in the past, Hyper-V uses a mechanism that is similar to what is used for backups. In the case of Windows virtual machines, for example, Hyper-V creates production checkpoints by using the Volume Shadow Copy Services. In the case of Linux virtual machines, the checkpoint process flushes the file system buffers prior to the checkpoint being created.

The Windows 10 version of Hyper-V actually gives you a choice of whether to create production checkpoints or legacy checkpoints. To do so, open the Hyper-V Manager, right click on a virtual machine and select the Settings option from the shortcut menu. Next, select the Checkpoints container, and then choose either Production Checkpoints or Standard checkpoints, as shown in Figure 1. This option does not exist in Windows Server 2016.

[Click on image for larger view.]Figure 1. The Windows 10 version of Hyper-V allows you to choose between standard and production checkpoints.

Needless to say, production checkpoints are a very welcome feature in the latest version of Hyper-V. Now, however, Microsoft needs to take its checkpoint feature to the next level with a group checkpoint feature.

Although production checkpoints do add application awareness, they only partially get the job done. Consider an Exchange Server, for example. Creating a production checkpoint of an Exchange Server would give you the ability to safely revert that server back to a previous state. The problem is that like many other applications, Exchange Server often utilizes multiple servers. Rolling back one Exchange Server without rolling back the others could potentially cause problems, depending on how the server is configured.

Even if Exchange is configured in a single server deployment, nearly all of the Exchange Server configuration information resides in the Active Directory. Rolling back a mailbox server without also rolling back a domain controller could cause Active Directory to continue to believe that certain users are mail enabled, even if rolling back the Exchange Server has caused those user's mailboxes to become non-existent.

If Microsoft introduces a group checkpoint feature, it could give administrators the ability to checkpoint and rollback servers as a group rather than individually. This idea is not entirely unprecedented. Some of the Disaster Recovery as a Service (DRaaS) vendors offer a protection group feature that collectively allows a group of machines to be rolled back to a specific point in time.

Admittedly, such a feature would be of limited use in production environments because rolling back an application would presumably result in at least some data loss. Besides, most organizations probably aren't going to deploy a dedicated Active Directory forest for each application. Think back to my previous example for a moment, when I talked about using a checkpoint to roll back Exchange Server. If production checkpoints of Exchange Servers and domain controllers were created simultaneously, then it would theoretically be possible to safely roll back Exchange to a previous state. However, rolling back a domain controller just to enable an Exchange Server roll back isn't a practical option because there are other applications that depend on the Active Directory.

Even though this type of roll back might not be practical in production, it would be great for DevOps environments. I just finished creating a series of 32 virtual machines for an educational video series that I am about to record. In doing so, I had to checkpoint each VM individually so that I could test each technique that I will be talking about in the videos, and then roll back the lab environment before I start recording. The technique that I am using works, but life would be a lot easier if there was a group checkpoint feature so that I didn't have to manage each VM's checkpoints individually.

Of course it is possible to write a PowerShell script to handle checkpointing-related tasks for multiple Hyper-V virtual machines. Simply create a variable containing the names of the VMs that you want to checkpoint, and then use a For Each loop to checkpoint each VM listed within the variable.

About the Author

Brien Posey is a 16-time Microsoft MVP with decades of IT experience. As a freelance writer, Posey has written thousands of articles and contributed to several dozen books on a wide variety of IT topics. Prior to going freelance, Posey was a CIO for a national chain of hospitals and health care facilities. He has also served as a network administrator for some of the country's largest insurance companies and for the Department of Defense at Fort Knox. In addition to his continued work in IT, Posey has spent the last several years actively training as a commercial scientist-astronaut candidate in preparation to fly on a mission to study polar mesospheric clouds from space. You can follow his spaceflight training on his Web site.