How To "Pause" (Not Suspend) A Virtual Machine In ESXi?

Last week I received a very interesting question from a fellow blogger asking whether it was possible to "pause" (not suspend) a virtual machine running on ESXi. Today ESXi only supports the suspend operation which saves the current memory state of a virtual machine to disk. With a "pause" operation, the memory state of the virtual machine is not saved to disk, it is still preserved in the physical memory of the ESXi host. The main difference with a "pause" operation is the allocated memory is not released and this allows you to quickly resume a virtual machine almost instantly at the cost of holding onto physical memory.

The use case for this particular request was also quite interesting. The user had an NFS server that housed about 200 virtual machines that needed to be restarted and the goal was to minimize the impact to his virtual machines as much as possible. He opted out from suspending the virtual machines as it would have taken too long and decided on a more creative solution. He filled up the remainder capacity on the datastore which in effect caused all virtual machines to halt their I/O operations. Though not an ideal solution IMHO, this allowed him to restart the NFS server and then run a script for the virtual machines to retry their I/O operation once the NFS server was available again.

Based on the above scenario, he asked if it was possible to "pause" the virtual machines similar to a capability Hyper-V provides today which would have provided him a quicker way to resume the virtual machines. Thinking about the question for a bit, a virtual machine is just a VMX process running in ESXi and I wondered if this process could be paused like a UNIX/Linux process using the "kill" command. Well, it turns out, it can be!

Disclaimer: This is not officially supported by VMware, use at your own risk.

Using the kill command, you can pause the VMX process by sending the STOP signal and to resume the VMX process, you can send the CONT signal. Before getting started, you will need to identify the PID (Process ID) for the virtual machine's VMX process.

There are two methods of identifying the parent VMX PID, the easiest is using the following ESXCLI command:

esxcli vm process list

The PID for the virtual machine will be listed under the "VMX Cartel ID" and in this example I have a virtual machine called vcenter51-1 and on the right I am pinging the system to verify it is up and running. An alternative way of identifying the PID is to use "ps" by running the following command:

ps -c | grep -v grep | grep [vmname]

Note: Make sure you identify the parent PID of the virtual machine if you are using the above command as you will see multiple entries for the different VMX sub-processes.

To pause the VMX process, run the following command (substitute your PID):

kill -STOP [pid]

To resume VMX process, run the following command:

kill -CONT [pid]

Here is a screenshot of pausing and then resuming the virtual machine. You can also see where the pings stop as the virtual machine is paused and then resumed. Once the virtual machine was resumed, it operated exactly where it left off with no issues as far as I can tell.

Note: I have found that if you have VM monitoring enabled, there maybe issues resuming the virtual machine. This should only be done if you have VM monitoring disabled as it may not be properly aware that the VMX process being paused on purpose.

Though it is possible to pause a virtual machine, I am not sure I see too many valid use cases for this feature? Are there are use cases where this feature would actually be beneficial, feel free to leave a comment if you believe there are. For now, this is just another neat "notsupported" trick 😉

Reader Interactions

Comments

Awesome!I found this trick similar to the vMotion ‘stun’ operation where the source VM is ‘paused’ while QuickResume transmits the remaining memory pages to the destination VM which is now live.Could that be the ‘stun’ operation is just a kill -STOP followed by a kill when vMotion is completed!?

BTW, you can get rid of the grep -v grep pipe by enclosing the first letter (or first few even) of your vmname in square brackets like this:

ps -c | grep [v]mname

it’s hard to explain why this works, has to do with character sets etc, but it does work, even in the ash shell in esxi. There isn’t a massive performance difference or anything of course, but it does get rid of the one pipe and 2nd grep invocation.

Instead of writing those complicated commands every single time you can simply put even more complicated one-liners in /etc/profile.local and forget all that gibberish. Then you just use ‘pausevm vmname’ or ‘unpausevm vmname’ or even ‘checkpausevm vmname’.

What is somewhat cool in my solution is that you can use several VMs as the arguments, e.g. ‘pausevm vmname1 vmname2 vmname3’ and if a VM’s name got spaces you just double-quote it, like ‘pausevm vmname1 “vmname with spaces” “another vmname”‘ etc.

Good to know, thanks.My boss who uses Xen is always talking about a quick “pause” of his vm’s where my vmware suspending takes ages to execute and another long time to bring the vm up again. This pausing is useful for storage or network maintenance as we do not have vmotion to quickly get a vm out of the way.

“He filled up the remainder capacity on the datastore which in effect caused all virtual machines to halt their I/O operations”. I believe this situation should actually suspend his VMs and he should not be able to ‘Pause’ his VMs. Whats your opinion William?

Would a use case like this be an appropriate use of the pause vm methodology. We are swapping in a bypass router to allow for some fairly significant network router maintenance. A lot of our VM’s reside on netapp CDOT NFS storage. Our concern is whether the NFS timeout value (90 seconds in our case) will be enough to handle the hot swap of the routers. It has been proposed that we select any vm’s with storage on that CDOT cluster and suspend them. But suspend would seem to take way too long. The pause methodology described above would seem to provide a way to quickly stop vm’s before the hot swap then resume?

Sorry but this is handiwork… In the first place, if you need to restart your NFS server then you cleanly power-off your VM’s, period. What’s next ? I’m going to call BMW and explain them that I want to change my rear left wheel but I have to find a way doing it while the vehicule is moving <_< Such ways of thinking/proceeding are non-sense in my opinion.

Primary Sidebar

Search this website

Author

William Lam is a Staff Solutions Architect working in the VMware Cloud on AWS team within the Cloud Platform Business Unit (CPBU) at VMware. He focuses on Automation, Integration and Operation of the VMware Software Defined Datacenter (SDDC).