OpenStack / Cloud / Virtualizaton / Linux

Troubleshooting Performance Issues on Multiple Vcpu Virtual Machines

I’ve been doing a bit of research on performance issues with virtual machines that have multiple Vcpus. In my case there are several 4 Vcpu machines that often seem to have performance issues. Specifically these are Windows 2003 Virtual Machines running Cold Fusion.

“An ESX server has to provide ALL processors at the the time a VM
requests it, it can’t give 1 CPU then the other, they both have to be
available. As your ESX server becomes busier, it has less time to
allocate to VM’s that need more CPU’s, thus you get less and less time
slice, which causes your VM to wait for CPU cycles, thus causing the
programs inside the VM to miss a few cycles, and then it causes delays
which causes slower performance. So it does get slower, depending on the number of multi-CPU’s you have
and the CPU of the physical cores on the ESX host. Also Adding More
CPU’s isn’t necessary unless you have applications that can use it.
Just because a VM shows both CPU in use, don’t assume that to mean that
meaningful work is being performed.”

Also found this information on how to test weather or not your multiple Vcpu machines are shooting themselves in the foot.

On the CPU screen, check the %CSTP
value. If this number is higher than 100, the performance issues may be
caused by the vCPU count. Try lowering the vCPU count of the virtual
machine by 1.Note: The %CSTP value
represents the amount of time a virtual machine with multiple virtual
CPUs is waiting to be scheduled on multiple cores on the physical
host. The higher the value, the longer it waits and the worse its
performance. Lowering the number of vCPUs reduces the scheduling wait time.

Note that my issues are occuring in ESX 3.5. Apparently in ESX4 many improvements have been made to the CPU scheduler. I have yet had a chance to see this for myself.

“In ESX 4, many improvements have been introduced in CPU scheduler. This includes further relaxed co-scheduling, lower lock-contention, and multi-core aware load balancing. Co-scheduling overhead has been further reduced by the accurate measurement of the co-scheduling skew, and by allowing more scheduling choices. Lower lock-contention is achieved by replacing scheduler cell-lock with finer-grained locks. By eliminating the scheduler-cell, a virtual machine can get higher aggregated cache capacity and memory bandwidth. Lastly, multi-core aware load balancing achieves high CPU utilization while minimizing the cost of migrations.”