Disabling TPS in vSphere – Impact on Critical Applications

Starting with update releases in December, 2014, VMware vSphere will default to a new configuration for the Transparent Page Sharing (TPS) feature. Unlike in prior versions of vSphere up to that point, TPS will be DISABLED by default. TPS will continued to be disabled for all future versions of vSphere.

In the interim, VMware has released a Patch for vSphere 5.5 which changes the behavior of (and provides additional configuration options for) TPS. Similar patches will also be released for prior versions at a later date.

Why are we doing this?

In a nutshell, independent research indicates that TPS can be abused to gain unauthorized access to data under certain highly controlled conditions. In line with its “secure by default” security posture, VMware has opted to change the default behavior of TPS and provide customers with a configurable option for selectively and more securely enabling TPS in their environment. Please read “Security considerations and disallowing inter-Virtual Machine Transparent Page Sharing (2080735)” for more detailed discussion of the security issues and VMware’s response.

What does this mean for your Business Critical Applications?

One of our standard recommendations for optimizing a vSphere infrastructure for a critical workload is to leave TPS enabled in the infrastructure. One other recommendation is that customers should not over-subscribe the physical resources in a vSphere cluster supporting business critical applications. Where over-commitment of resources is unavoidable, we have prescribed that the resources allocated to the VMs hosting the critical, IO-intensive workloads should be reserved at all times to avoid contentions.

Another common prescriptive guidance for critical applications workloads is the use of large pages. Most modern operating systems and enterprise applications support large pages, and we recommend enabling it for critical applications.

In a vSphere cluster free of resource over-commitment and contentions, the TPS feature is not used. It should also be noted that, on modern hardware (with hardware assisted memory virtualization capabilities), the virtualized workloads will leverage the efficiency of large pages before vSphere utilizes the TPS feature. In short, in a vSphere environment built to prescription, the TPS feature is unlikely to be actively in use (or used to any significant degree) for a business critical applications workload.

Nevertheless, the planned change in TPS behavior should be of interest and we are providing the following guidance to help our customers understand the implications and steps that could be taken to avoid any negative impact from the change.

Starting with the Patch mentioned above, customers will be able to disable TPS on their ESXi hosts and selectively enable TPS for one or more VMs.

Starting with the Update, TPS will be disabled on ESXi hosts by default. Customers will be able to selectively enable TPS for one or more VMs.

What if you want to disable TPS now – before the Patch or Update?

Although VMware believes that the reported possible information disclosure in TPS can only be abused in very limited configuration scenarios, VMware advices customers who are sufficiently concerned about this possibility to proactively disable TPS on their ESXi hosts. Customers do not have to wait for the either the Patch or the Update releases to do this.

To do this for ESXi 5.x, perform the following steps:

Log in to ESX\ESXi or vCenter Server using the vSphere Client.

If connected to vCenter Server, select the relevant ESX\ESXi host.

In the Configuration tab, click Advanced Settings under the software section.

In the Advanced Settings window, click Mem.

Look for Mem.ShareScanGHz and set the value to 0.

Click OK.

Perform one of the following to make the TPS changes effective immediately:

Migrate all the virtual machines to other host in cluster and back to original host.

Shutdown and power-on the virtual machines.

NOTE: If you use this option to disable TPS, you MUST manually reconfigure this option (by Mem.ShareScanGHz to “4”) after applying the Patch or Update before you can enable salting on the ESXi host.

What if you want to continue using TPS after the Patch/Update?

A couple of new Advanced Configuration options are introduced by the Patch (and these will be preserved by the Update as well):

Mem.ShareForceSalting: This is a host-level configuration option. This is what disables/enables TPS on an ESXi host. If this is set to “0”, it means that TPS is STILL enabled on the host. If set to “1”, it means that TPS has been disabled on the Host, and salting is required in order for TPS to work on any VM located on that host.

sched.mem.pshare.salt: This value enables customers to selectively enable page sharing between/among speficic VMs. When ShareForceSalting is set to “1” on an ESXi host, the only way for two or more VMs to share a page is for both their salt and the content of the page to be same. The salt is the value specified by customers for this per-VM Advanced Configuration option. This value must be identical on all the VMs that you intend to enable page sharing for.

IF ShareForceSalting is set to “1” and the sched.mem.pshare.salt is not set on a VM, the VM’s vc.uuid will be substituted for the salt value instead. Because the vc.uuid is unique to a VM, that VM will only be able to share page with itself – effectively, no sharing for this VM.

What should you watch out for?

If you are currently over-provisioning resources in your business critical applications cluster, you should review and re-evaluate your usage and consider adding resources to the vSphere Cluster or re-sizing your VMs. Because TPS is largely about memory, adding more memory to the ESXi hosts, or reducing memory allocated to the VMs, will be sufficient remediation.

Without adequate re-sizing or additional resources, the immediate side effect of this change will be an increase in memory contention and performance degradation which will, consequently, be accompanied by swapping and ballooning.

If you are not over-committing resources, or if you are reserving memory for your critical applications workloads as recommended, your workloads will not be negatively impacted by this change in TPS behavior.

Comments

4 Comments been added so far

Cyno

January 28th, 2015

This is a Terrible idea to implement by default, when companies have built infrastructure strategies upon hardware over-subscription as one of the fundamental reasons for choosing VMware ESX. It should be moved to the Lockdown Function. In fact, the TPS setting to activate a lower physical memory utilization should be exposed. What’s the point of alarming at 90% for high memory, and only turning on TPS at 92%? MOST systems are not business critical, and should not automatically be handled as such. This action only goes to help generate more license revenue for both VMware (via more licensed hosts to accommodate more RAM) and for Memory manufacturers. Did I mention this is a TERRIBLE idea to implement by default?

Nabeel Sayegh

March 8th, 2015

I would have to agree….this is was not the smartest move. After a recent round of updates, the number of hosts we have that started alerting of high memory was flat out ridiculous. In addition, Customers complained about poor performance due to excessive vMotions and in some cases ballooning and compression kicking in. Having spent tons of time trying to figure out why….I came across this. Needless to say, I was not happy….at all. Re-enable TPS….life is good again.

Deji

March 8th, 2015

@Nabeel Sayegh – I’m sure that you can see the point of taking the “secure by default” route, if you think more about it. If VMware had done the opposite, many people would have found that irresponsible. This new behavior has been documented and propagated extensively over several months now. I’m glad that you found this post useful and that you were to return to stability.

Ravindra M

October 13th, 2015

Hi,

What is the Impact if the values are mentioned below ?? Intra-VM TPS enabled ?? And we did change any config at VM level.