vSphere 6.5 Encrypted vMotion Architecture and Performance

With the rise in popularity of hybrid cloud computing, where VM sensitive data leaves the traditional IT environment and traverses over the public networks, IT administrators and architects need a simple and secure way to protect critical VM data that traverses across clouds and over long distances.

The Encrypted vMotion feature available in VMware vSphere® 6.5 addresses this challenge by introducing a software approach that provides end-to-end encryption for vMotion network traffic. The feature encrypts all the vMotion data inside the vmkernel by using the most widely used AES-GCM encryption standards, and thereby provides data confidentiality, integrity, and authenticity even if vMotion traffic traverses untrusted network links.

A new white paper, “VMware vSphere 6.5 Encrypted vMotion Architecture, Performance and Best Practices”, is now available. In that paper, we describe the vSphere 6.5 Encrypted vMotion architecture and provide a comprehensive look at the performance of live migrating virtual machines running typical Tier 1 applications using vSphere 6.5 Encrypted vMotion. Tests measure characteristics such as total migration time and application performance during live migration. In addition, we examine vSphere 6.5 Encrypted vMotion performance over a high-latency network, such as that in a long distance network. Finally, we describe several best practices to follow when using vSphere 6.5 Encrypted vMotion.

In this blog, we give a brief overview of vSphere 6.5 Encrypted vMotion technology, and some of the performance highlights from the paper.

Brief Overview of Encrypted vMotion Architecture and Workflow

vMotion uses TCP as the transport protocol for migrating the VM data. To secure VM migration, vSphere 6.5 encrypts all the vMotion traffic, including the TCP payload and vMotion metadata, using the most widely used AES-GCM encryption standard algorithms, provided by the FIPS-certified vmkernel vmkcrypto module.

Encrypted vMotion does not rely on the Secure Sockets Layer (SSL) and Internet Protocol Security (IPsec) technologies for securing vMotion traffic. Instead, it implements a custom encrypted protocol above the TCP layer. This is done primarily for performance, but also for the usability reasons explained in the paper.

As shown in Figure 1, vCenter Server prepares the migration specification that consists of a 256-bit encryption key and a 64-bit nonce, then passes the migration specification to both source and destination ESXi hosts of the intended vMotion. Both the ESXi hosts communicate over the vMotion network using the key provided by vCenter Server. The key management is simple: vCenter Server generates a new key for each vMotion, and the key is discarded at the end of vMotion. Encryption happens inside the vmkernel, hence there is no need for specialized hardware.

Brief look at Encrypted vMotion Performance

Encrypted vMotion Duration

The figure below shows the vMotion duration in several test scenarios in which we varied vCPU and memory sizes. The figure shows identical performance in all the scenarios with and without encryption enabled on vMotion traffic.

Encrypted vMotion CPU Overhead

The figures below show the CPU overhead of encrypting vMotion traffic on source and destination hosts, respectively. The CPU usage is plotted in terms of the CPU cores required by vMotion.

The above figures show that CPU requirements of encrypted vMotion are very moderate. For every 10Gb/s of vMotion traffic, encrypted vMotion only requires less than one core on the source host and less than half a core on the destination host for all the encryption-related overheads.

Encrypted vMotion Performance Over Long Distance

The figure below plots the performance of a SQL Server virtual machine in orders processed per second at a given time—before, during, and after encrypted vMotion on a 150ms round-trip latency network.

As shown in the figure, the impact on SQL Server throughput was minimal during encrypted vMotion. The only noticeable dip in performance was during the switch-over phase (in the range of 1 second) from the source to destination host. It took less than few seconds for the SQL server to resume its normal level of performance.