1. Reduce complexity where possible.
2. Maximize the efficiency of the infrastructure

Architectural Decision

Set the Power Policy for all Linked Clone desktop pools to “Power Off”

Justification

1. Using disposable disks can save storage space by slowing the growth of linked clones and reducing the space used by powered off virtual machines.
2. Using the “Power Off” policy for the pool means at user logoff (or shutdown) the disposable disk will be refreshed, therefore reducing the capacity usage at the storage layer.3. “Powered Off” VMs do not have a Virtual Machine SWAP file which will also reduce storage consumption.

Implications

1. Setting the policy to “Power Off” will result in more frequent power operations which may impact the performance of the storage and vCenter.
2. When a user attempts to login to a desktop which has been powered off, there will be a delay while the VM is powered on and booting up before the user will be logged in.3. The peak concurrency rate of users will need to be understood to allow accurate storage planning for the VSWAP file.

Alternatives

1. Increase the frequency of Recompose / Refresh / Rebalance operations2. Set the Policy to “Take no power action” and schedule an Administrator task to periodically change the Power Policy to “Powered Off” during a maintenance window.
3. Set the Policy to “Ensure desktops are always powered on” and schedule an Administrator task to periodically change the Power Policy to “Powered Off” during a maintenance window.
4. Set the Policy to “Suspend” and schedule an Administrator task to periodically change the Power Policy to “Powered Off” during a maintenance window, however this will consume extra storage for the Suspend File.
5. Use Memory Reservations to reduce storage requirements for vSwap and leave Power Policy to “Always On”.

Related Articles:

The example architectural decision was contributed to by Travis Wood (@vTravWood) and was inspired by the following article:

The following is a register of all Example Architectural Decisions related to Transparent Page Sharing on VMware ESXi following the announcement from VMware that TPS will be disabled by default in future patches and versions.

The goal of this series is to give the pros and cons for multiple options for the configuration of TPS for a wide range of virtual workloads from VDI, to Server, Business Critical Apps , Test/Dev and QA/Pre-Production.

In a VMware vSphere environment, with future releases of ESXi disabling Transparent Page Sharing by default, what is the most suitable TPS configuration for a Virtual Desktop environment?

Assumptions

1. TPS is disabled by default
2. Storage is expensive
3. Two Socket ESXi Hosts have been chosen to align with a scale out methodology.
4. Average VDI user is Task Worker with 1vCPU and 2GB Ram.
5. Memory is the first compute level constraint.
6. HA Admission Control policy used is “Percentage of Cluster Resources reserved for HA”
7. vSphere 5.5 or earlier

Requirements

1. VDI environment costs must be minimized

Motivation

1. Reduce complexity where possible.
2. Maximize the efficiency of the infrastructure

Architectural Decision

Enable TPS and disable Large Memory pages

Justification

1. Disabling Large pages is essential to maximizing the benefits of TPS
2. Not disabling large pages would likely result in minimal TPS savings
3. With Kiosk and Task worker VDI profiles, the percentage of memory which is likely to be shared is higher than for Power users.
4. Existing shared storage has plenty of spare Tier 1 capacity to vSwap files

Implications

1. Sufficient capacity for VM swap files must be catered for.
2. VDI & Storage performance may be impacted significantly in the event of memory contention.
3. Decreased memory costs may result in increased storage costs.
4. During patching, and operational verification that non default settings have not been reverted by the patching of ESXi.
5. Additional CPU overhead on ESXi from enabling TPS.
6. HA admission control will calculate fail-over requirements (when using Percentage of cluster resources reserved for HA) so that performance will be approximately the same in the event of a fail-over due to reserving the full RAM reserved for every VM,
6. HA admission control (when configured to Percentage of Cluster resources reserved for HA) will only calculate fail-over capacity based on 0MB + VM overhead for each VM which can lead to significantly degraded performance in a HA event.
7. Higher core count (and higher cost) CPUs may be desired to drive overcommitment ratios as RAM will be less likely to be a point of contention.