Post navigation

Performance Troubleshooting VMware vSphere – Memory

Introduction

As memory prices continue to drop and the x64 bit architecture is embraced and adopted more in the industry, we continue to see a rise in memory demands. Only a few years ago, 1-2 GB virtual machines were the norm, 95% of these being 32 bit operating systems. From my personal experience I have seen this trend change to 2-4 GB as a norm, with the more high performing virtual machines consuming anywhere from 4-16 GB of memory. VMware has answered this demand with vSphere now delivering up to 1TB of addressable memory per physical host, and up to 255GB per virtual machine.

With processors now more powerful than ever, the general shift of virtual machine limitations is changing from compute to memory. This is reflected in our industry today as we see an increase in the memory footprint on traditional servers (Intel Nehalem), and vendors such as Cisco introducing extended memory technology which can more than double the standard memory configuration. I recently had the opportunity to sit in on a Cisco Unified Computing System architectural overview class, and was impressed with what I saw. The extended memory technology is quite unique because it not only allows you to scale our on your memory configuration, it uses a special ASIC to virtualize the memory so there is no reduction in bus speed. A financial advantage to having this many DIMM sockets is you can use lower capacity DIMMs (2 GB or 4GB) to achieve the same memory configuration in a standard server where you would have to use 8GB DIMMs.

Memory Technologies in VMware vSphere

There are some major benefits of virtualization when it comes to memory. VMware implements some sophisticated and unique ways of maximizing physical memory workloads within an ESX host. All of these features work out of the box with no advanced configuration necessary. To understand problems that might occur in your environment you need to be familiar with these basic memory concepts.

Transparent Page Sharing – The VMkernel will compare physical memory pages to find duplicates, then free up this redundant space and replaces it with a pointer. If multiple operating systems are running on one physical host, why should you load the same files multiple times? Think of this as the data de-duplication process we are seeing in a majority of backup solutions in the industry.

Memory Overcommitment – The act of assigning more memory to powered on virtual machines than the physical server has available. This allows for virtual machines that have heavier memory demands to utilize the memory that is not actively being used on under utilized machines.

Memory Overhead - Once a virtual machine is powered on the ESX host reserves memory for the the normal operations of VMware infrastructure. This memory can’t be used for swapping or ballooning, and is reserved for the system.

Memory Balloon Driver – When VMware tools are installed on a virtual machine they provide device drivers into the host virtualization layer, from within the guest operating system. Part of this package that is installed is the balloon driver or “vmmemctl” which can be observed inside the guest. The balloon driver communicates to the hypervisor to reclaim memory inside the guest when it’s no longer valuable to the operating system. If the Physical ESX server begins to run low on memory it will grow the balloon driver to reclaim memory from the guest. This process reduces the chance that the physical ESX host will begin to swap, which you will cause performance degradation. Here is an illustration if ballooning in ESX:

What to look for

Check ESX host swapping. If you are overcommitting memory on the physical ESX host you can run into a situation when each virtual machine is in need of the total amount of what is granted. When the host is out of memory it will begin to page out. Keep an eye on your oversubscription rates of physical hosts, or ensure you have enough memory resources across your DRS clusters so it can balance the load more effectively. Swapping will occur when the following formula is met:

Check for Virtual machine swapping. Make sure you virtual machines have enough memory for the application workload that they are supporting. If virtual machine swapping starts to occur this can put a strain on the disk subsystem.

Check to ensure VMware tools are installed and updated. VMware tools not only provides drivers from the guest to the hypervisor, but the balloon driver also gets installed with VMware tools. For proper memory management the ESX host relies on the balloon driver to manage memory.

Check memory reservation settings. By default VMware ESX dynamically tries to reclaim memory when not needed. There are situations when you might choose to utilize memory reservations. If you set memory reservations in your environment be aware that this memory is permanently assigned to the host and can not be reallocated when it’s not being used. Don’t sell the balloon driver short, many third part application vendors over spec their configurations for personal safety, and ballooning can help counteract some of that wasted “fluff factor”.

Monitoring with Virtual Center

The first place I would start with checking memory configurations is Virtual Center. Virtual Center provides excellent reporting and gives you granular control over which metrics you would like to report against. VMware vSphere now includes a nice graphical summary in the performance tab of the physical host. This gives you a quick dashboard type view of the overall health of the system over a 24 hour period. Here are some memory samples:

Check your over all % usage (lower is better)

Check your Ballooning (lower is better)

Selecting the advance tab gives you a much more granular way of viewing performance data. At first glance this might look like overkill, but with a little bit of fine tuning, you can make it report on some great historical information. Here is a snapshot of memory utilization with many of the variables we just discussed above, great snapshot of what’s going on (looks healthy below):

Check your various metrics, mainly for swapping activity

The virtual center performance statistics by default display the past hour of statistics, and show a more detailed analysis of what’s currently happening on your host. Select the option “Chart Options” to change values such as time/date range and which counters you would like to display.

Virtual Center Alarms are an excellent tool that can sometimes be overlooked and forgotten about. While this is more of a proactive tool than a reactive or troubleshooting tool, I thought it was worth mentioning. Setup Memory alerts so you will be notified via e-mail if a problem starts to manifest itself. Here is an alarm configured to trigger if physical host Memory usage is above 90% for 5 minutes or greater. A lot of these alerts are built into Virtual Center so you don’t have to do a lot of pre-configuration work. You do need to make sure you setup the e-mail notifications under the “Actions Tab”.

Monitoring with ESXTOP

Esxtop is another excellent way to monitor performance metrics on an ESX host. Similar to the Unix/Linux “Top” command, this is designed to give an administrator a snapshot of how the system is performing. SSH to one of your ESX servers and execute the command “esxtop”. The default screen that you should see is the CPU screen, if you need to monitor memory select the “m” key. Esxtop gives you great real-time information and can even be set to log data over a longer time period, try “esxtop –a –b > performance.csv”. Check your total Physical memory here, make sure you aren’t over committing and causing swapping. Examine what your virtual machines are doing, if you want to just display the virtual machine worlds hit the “V” key.

Monitor inside the Virtual Machine

A great feature VMware introduced for Windows virtual machines was integrating VMware performance counters right into the Performance Monitor or “perfmon” tool. If your running vSphere 4 update 1 make sure you read this post first as there is a bug with the vmtools that will prevent them from showing up. You can monitor the same metrics found in Virtual Center and esxtop here. Just another way of getting at the data especially if you have a background in Microsoft Windows and are familiar with perfmon.

Monitoring with PowerCLI

Another great place to go to for finding potential memory problems and bottlenecks is PowerCLI. I have been using PowerGUI from Quest, accompanied by a powerpack from Alan Renouf. If your not a command line guru don’t let this discourage you. PowerGUI is a windows application that allows you to run pre-defined PowerCLI commands against your Virtual Center server or your physical ESX hosts. Want to find out what your ESX host service console memory is set to? How about virtual machines that have memory reservations, shares or limits configured? You can pull all of this information using Alan’s powerpack.

Conclusion

If your using VMware vSphere, there are many different ways to monitor for memory problems. The Virtual Center database is the first place you should start. Check your physical host memory conditions, then work your way down the stack to the virtual machine(s) that might be indicating a problem. Take a look at esxtop, check some of the key metrics that we discussed above.

Look for the outliers in your environment. If something doesn’t look right, that’s probably the case. Scratch away at the surface and see if something pops up. Use all possible tools available to you like PowerCLI. Approaching problems from a different perspective will sometimes bring light to a situation you weren’t aware of. If all else fails, engage VMware support and open a service request. Support contracts exist for a reason and I have opened many SR’s that were new technical problems that have never been discovered by VMware support.

About Scott Sauer

I’m a Senior Systems Engineer for Tintri in Cincinnati Ohio. I am married to a wonderful woman (Alison) and have the privilege of raising two boys with her. I have over 16 years of experience in the Information Technology field with a background in virtualization, systems architecture, disaster recovery/ business continuity, storage area networking and data center operations.

Another great post Scott! I'm really enjoying this series, and i can't wait for the storage one.

scottsauer

Thanks Hany! That is going to be a difficult one especially since there are so many diverse storage solutions out there.

Fusecode

When not running virtual, is the OS able to utilize all the RAM installed (x86)? I keep reading about allocation and Im wondering if that means permanent allocation or just when virtualization is started. Any help with this question would be appreciated.

If you are interested in these performance counters from within a Linux guest, good news.

I recently implemented a python-vmguestlib wrapper that ships its own tool vmguest-stats for displaying those performance counters. And I have added 3 new plugins in Dstat specifically for those VMGuestLlib SDK counters.

So correlating these counters with other performance data is as simple as: