Bookmarks

VM Swapfile (.vswp) placement with SRM

Nowadays VMware Site Recovery Manager (SRM) gets implemented more and more and like vSphere, VMware SRM needs a good architectural design before starting off.

One of the design considerations is around the placement of the Virtual Machine Swap File (.vswp) which I want to give some more information about in this article.

Let’s first take a look at the VM Swap File (.vswp), by default this file is placed in the VM “working directory” which also contains all the other VM files. The .vswp is created every time the VM is started and equals the size to the unreserved memory configured on the VM. If the VM is configured with 2 GB and memory reservation is set to 0 MB (default) the VM Swap File will be 2 GB. If memory reservation in this example would be 1 GB than the .vswp file will be 2GB – 1GB = 1 GB total.

The design considerations are about:

Keeping the .vswp file in its default “working directory”;

Placing the .vswp on a separated non-replicated datastore.

Keeping the .vswp file in its default “working directory” means that the .vswp file will be replicated to the recovery site as indicated in the next overview:

Pros:

Ease of manageability, all VM files are together and it’s default;

Cons:

More replication bandwidth is needed for files (.vswp) that aren’t used at the recovery site;

Cost is higher since more replicated storage space is used;

Increases the recovery speed of both the test and real failover. This is due to the fact that SRM explicitly deletes the useless .vswp files on the recovery site before starting the VM’s.

Placing the .vswp on a separated non-replicated datastore involves some manual work, possibly even reconfiguration of all the current available Virtual Machines. The following overview shows the configuration:

Pros:

Does not consume unused storage replication traffic;

Uses less replicated storage, which could be more expensive than non-replicated storage.

Cons:

Can be more difficult to manage because some parts of a virtual machine reside on a separate datastore.

Requires additional configuration and management processes within SRM since the VM would be detect with a non-replicated datastore which consequently causes SRM to remove the VM from its protection group.

An important note that needs to be made is around NFS storage. As indicated, one of the drawbacks on keeping the .vswp file in the “working directory” is the fact that this increases the recovery speed since SRM deletes the replicated, useless, .vswp file before starting the VM.

Deleting the .vswp file from a newly recovered NFS datastore can take up some time since ESX needs to wait for the replicated file lock to expire (default 35 seconds). A quote from the the Best Practices on NAS Whitepaper:

Once a lock file is created, VMware periodically (every NFS.DiskFileLockUpdateFreq seconds) send updates to the lock file to let other ESX hosts know that the lock is still active. Changing any of the NFS locking parameters will change how long it takes to recover stale locks. The following formula can be used to calculate how long it takes to recover a stale NFS lock:

If any of these parameters are modified, it’s very important that all ESX hosts in the cluster use identical settings. Having inconsistent NFS lock settings across ESX hosts can result in data corruption!

This timeout isn’t applicable on VMFS datastores because the auto-resignaturing process drops the file locks automatically.

As with every design decision it’s all about knowing the pros/cons of the available options you have and as such select the best option for your environment.

Hi
Moving all those vswp files into a different location that is not replicated, will often reside in one or two designated data stores for just all the vswp files. Which would create a hugh single point of failure.

Is the time needed to delete the vswp file on the recovery site really an issue? SRM is no automated failover (and should never be) and before failover is started a manager had to be called out of bed to approve the initiation of the failover. Now, I do admit that with a 1000 VMs, the difference between 1sec or 10sec to delete a .vswp can be an issue. Would you have any info on how long deleting a vswp takes?

As far the replication traffic… as long as your VMs are not actively using the vswp there will be hardly any extra sync traffic. Only when starting the VM the creating of the vswp will create sync traffic.

Is that extra storage needed for the vswp an issue? Maybe… when using big numbers again like 1000 VMs, it could easily take 2TB or more for all your vswp files. Moving them to different storage however won’t be cheap either because you surely don’t wont this on bad performing storage. So I’m not sure if there is a lot of savings in that.