Verify each VM uses a configuration from the Cisco-provided OVA download file of the application version you are running.

+

+

:*To be TAC supported, it is required to use a Cisco-provided OVA to build the VM for initial install. E.g. see instructions for Unified Communications Manager here: http://www.cisco.com/en/US/docs/voice_ip_comm/cucm/virtual/CUCM_BK_CA526319_00_cucm-on-virtualized-servers_chapter_00.html#CUCM_TK_D1CB01EA_00

+

:*E.g. if you are running Unified Communications Manager 9.1(2), new deployments must use the OVA download file for 9.1(2). If you are upgrading from older versions, see the readme for the 9.1(2) OVA on how to handle the VM configurations from the old version.

+

:*It is not enough to just match the specs of the virtual hardware. The Cisco-provided OVAs include virtual disk drives whose partitions are aligned to 64K boundaries to optimize storage performance. It is required to use the Cisco-provided OVA to create the virtual machine, or you risk application issues due to non-optimized storage performance.

+

::*The storage / partition / filesystem alignment is setup at install via use of the OVA file, and is not changed by subsequent upgrades.

+

::*If the VM is manually created without use of the OVA, and alignment is not configured, it can only be resolved after the fact via the following procedure:

+

:::#Backup

+

:::#Deploy a VM configuration from the Cisco-provided OVA file of the application version

+

:::#Reinstall application

+

:::#Restore from backup

+

::*If Cisco TAC detects unaligned partitions, then if deemed necessary to provide effective support, you will be required to correct the unalignment before further troubleshooting can occur.

+

::*Some application versions will generate an alert if they detect unaligned partitions. For example, Unified Communications Manager 9.1(2) or higher will generate an alert similar to the following:

::*You may also notice from the above alert that there is a second problem: the VM was created with 2x146GB vDisks which does not match any of the supported VM configurations in the UCM OVA download file.

Just like some of the UC applications, vCenter can be configured to save more performance data. The more historical data saved, the bigger disk space needed by the database used by vCenter. Note, this is one of the main areas where you need vCenter rather than going directly to the ESXi host for performance data. vCenter can save historical data that the ESXi host does not keep.

Just like some of the UC applications, vCenter can be configured to save more performance data. The more historical data saved, the bigger disk space needed by the database used by vCenter. Note, this is one of the main areas where you need vCenter rather than going directly to the ESXi host for performance data. vCenter can save historical data that the ESXi host does not keep.

-

The configurations to change the amount historical data saved by vCenter is located in the vSphere client under '''Administration '''&gt; '''Server Settings'''. For each interval duration and save time the statistic level can be set. The statistics levels range from 1 to 4 with level 4 containing the most data. View the data size estimates to ensure there is enough space to keep all statistics.<br><br>

+

The configurations to change the amount historical data saved by vCenter is located in the vSphere client under '''Administration '''&gt; '''Server Settings'''. For each interval duration and save time the statistic level can be set. The statistics levels range from 1 to 4 with level 4 containing the most data. View the data size estimates to ensure there is enough space to keep all statistics.<br><br>

+

+

+

'''For a UC on UCS Specs-based or HP/IBM Specs-based deployment, Statistics Level 4 is required on all statistics'''. Configuring VMware vCenter to capture detailed logs, as shown in Figure 1 below, is strongly recommended. If not configured by default, Cisco TAC may request enabling these settings in order to troubleshoot problems.

Introduction

A virtual environment brings new considerations to troubleshooting and performance monitoring. Those considerations are discussed in this section.

General Guidelines

Performance indicators still valid from within virtual machines. For the UC applications that support it, use RTMT or the perfmon data for to analyze the performance of the UC application. Data from these tools provides a view of the guest performance: disk, CPU, memory, and other details.

Move to the VMware infrastructure when there is a need to get the perspective from the ESXi host. Use the vSphere Client to view data:

If vCenter is available, historical data is available through the client

If vCenter is not available, live data from the host is available through the client

VMware and VM Configuration

Verify your virtualization configuration matches the requirements/restrictions for each application.

E.g. if you are running Unified Communications Manager 9.1(2), new deployments must use the OVA download file for 9.1(2). If you are upgrading from older versions, see the readme for the 9.1(2) OVA on how to handle the VM configurations from the old version.

It is not enough to just match the specs of the virtual hardware. The Cisco-provided OVAs include virtual disk drives whose partitions are aligned to 64K boundaries to optimize storage performance. It is required to use the Cisco-provided OVA to create the virtual machine, or you risk application issues due to non-optimized storage performance.

The storage / partition / filesystem alignment is setup at install via use of the OVA file, and is not changed by subsequent upgrades.

If the VM is manually created without use of the OVA, and alignment is not configured, it can only be resolved after the fact via the following procedure:

Backup

Deploy a VM configuration from the Cisco-provided OVA file of the application version

Reinstall application

Restore from backup

If Cisco TAC detects unaligned partitions, then if deemed necessary to provide effective support, you will be required to correct the unalignment before further troubleshooting can occur.

Some application versions will generate an alert if they detect unaligned partitions. For example, Unified Communications Manager 9.1(2) or higher will generate an alert similar to the following:

You may also notice from the above alert that there is a second problem: the VM was created with 2x146GB vDisks which does not match any of the supported VM configurations in the UCM OVA download file.

vCenter Settings

Note:

Recall that VMware vCenter is mandatory for UC on UCS Specs-based and HP/IBM Specs-based, as described here. VMware vCenter is optional for UC on UCS TRC deployments.

Just like some of the UC applications, vCenter can be configured to save more performance data. The more historical data saved, the bigger disk space needed by the database used by vCenter. Note, this is one of the main areas where you need vCenter rather than going directly to the ESXi host for performance data. vCenter can save historical data that the ESXi host does not keep.

The configurations to change the amount historical data saved by vCenter is located in the vSphere client under Administration > Server Settings. For each interval duration and save time the statistic level can be set. The statistics levels range from 1 to 4 with level 4 containing the most data. View the data size estimates to ensure there is enough space to keep all statistics.

For a UC on UCS Specs-based or HP/IBM Specs-based deployment, Statistics Level 4 is required on all statistics. Configuring VMware vCenter to capture detailed logs, as shown in Figure 1 below, is strongly recommended. If not configured by default, Cisco TAC may request enabling these settings in order to troubleshoot problems.

Figure 1

VMware Performance Indicators

The following table lists the performance indicators to monitor and view from a VMware perspective when a virtual machine is having suboptimal (or bad) performance. Most counters are from the ESXi host, which can give a perspective of VM interactions and overall host and data store utilization.

Performance Area

Object

Counter

Acceptable range

CPU

Host

Usage

Less than 80%

CPU

Virtual Machine

Ready

Less than 3%

Memory

Host

Consumed

General trend is stable

Memory

Host

Balloon/Swap used

0 Kb

Disk

Specific datastore

Kernel command latency

Less than 3ms

Disk

Specific datastore

Physical device command latency

Less than 20ms

Disk

Specific datastore

Average commands issued per second

Less than LUN capacity

Network

Host

Receive packets dropped/Transmit packets dropped

0 packets

Physical Hardware Serviceability Items

Area

Top Items

View at

Alerted How?

CPU

Temperature

Utilization/status

Thresholds with events

Condition & events for abnormal state

ESXi Host or vCenter

SNMP/Email(via vCenter)

Memory

Utilization/status

Errors/condition

ESXi Host or vCenter

SNMP/Email(via vCenter)

Hard Drives

Utilization/status

Disk failure alerting

ESXi Host or vCenter

SNMP/Email(via vCenter)

RAID Controller

State (defunct, rebuilding, etc.)

Cache/battery status

Thresholds with events

ESXi Host or vCenter (DAS only)

SNMP/Email(via vCenter)

NIC

Port failure events

vCenter

SNMP/Email(via vCenter)

Power Supply

Voltage

Redundancy status

Thresholds with events

ESXi Host or vCenter

UCS Manager(B-series)(2)

SNMP/Email(via vCenter)

Fans

Status/Speed

Thresholds with events

ESXi Host or vCenter(C-series)UCS Manager(B-series)

SNMP/Email(via vCenter)

IO Controller

ESXi Host or vCenter(DAS only)

SNMP/Email(via vCenter)

Note:

The vSphere client can be used to view the data and alarms. vCenter is required for any automatic notification.

CPU Troubleshooting

A high CPU usage could be due to a small number of VMs taking all of the resources or too many VMs running on the host. For the too many VMs running case, look at the VMs running on the host and see if CPU reservations are in use (see oversubscription section). To isolate a CPU issue for a particular VM, consider moving it to another ESXi host.

To view the CPU performance indicators, go to the ESXi host's performance tab and select the Advanced button. Under Chart options, select CPU, timeframe, and then only the host (not individual cores) to view overall CPU usage on the host. You can view each VM's CPU usage from the Virtual Machines tab on the host.

To get a view of the reservations set by all of the VMs, use the Resource Allocation tab of the cluster.

Note:

The "Resource Allocation" tab is only available via vCenter.

Memory Troubleshooting

Our guidelines do not support memory sharing between VMs. To verify, follow the following performance indicators to make sure swapping and ballooning counters are zero. If a given VM does not have enough memory and there are not memory issues on the specific host, consider increasing the VM's memory.

To view the memory performance indicators, go to the ESXi host's Performance tab and select the Advanced button. Under Chart options, select Memory and Timeframe, then select the following counters:

Used memory (to view general trends)

Swap used

Balloon

Swap and Balloon should always be ZERO, otherwise memory sharing is being used (which should not be the case).

Disk Troubleshooting

Bad disk performance often shows up as high CPU usage. IOPS data can provide information on how hard the application/VM is working the disks. Specific activities can cause spikes in IOPS: upgrades and DB maintenance are two examples. If VMs running on the same datastore are all doing these activities at the same time, the disks might not be able to keep up. IOPS data can be seen from vCenter or the SAN. Disk latency (response time) is a good indicator of disk performance.

To view the disk performance indicators, go to the ESXi host's performance tab and select the advanced button. The appropriate datastore needs to be selected, which can be found on the datastore page (see below). Under chart options, select disk and timeframe, then select the following counters:

Physical device command latency

Kernel command latency

average commands issued per second

The kernel counter should not be greater than 2-3 ms. The physical device counter should not be greater than 15-20 ms. The "average commands issued per second" counter can be used if IOPS are not available from the SAN. IOPS should be considered if it looks like datastore is overload. This IOPS data is viewable from the host and each VM. Note, for NFS datastores, the physical and kernel latency data is not available. Starting in VMware 4.0 update 2 and beyond the esxtop command (see below) can be used to view NFS counters and in particular the guest latency (called GAVG in esxtop). The guest latency is a summation of the physical device and kernel latencies.

On the C-series UCS servers there have been issues with the write cache battery backup. If this battery is not operating correctly, performance will suffer. Use a tool like wbemcli to verify the battery is ok. An example of using the wbemcli:

Network Troubleshooting

Generally, network performance issues can be seen by dropped packets. If dropped packets are seen from a ESXi host, the network infrastructure needs to be investigated for the issue, which might include a virtualized switch (Nexus 1000V). In ESXi 4.1, issues have been seen with large file transfers (e.g. SFTP/FTP transfers). For this issue, the Large Receive Offload options need to be disabled on the ESXi host. That setting is found on the host's Configuration tab -> Advanced Settings -> Net.*. Note, there are several LRO settings on this page and all of them need to be disabled. If a VM has been cloned and uses static MAC addresses, verify there are not duplicate MAC addresses in the network. LRO settings:

To view the network performance indicators, go to the ESXi host's performance tab and select the advanced button. Under chart options, select Network, timeframe, then select the following counters:

Receive packets dropped

Transmit packets dropped

The main thing to check is that no packets are getting dropped in the network.

Note:

Advanced network debugging and configuration can be done on Nexus 1000v (if used, which requires vCenter and Enterprise Plus licensing).

Alternate Access to Performance Data

If vCenter and/or the vSphere client are not available, some real time data can be pulled using command line tools. If you have a vMA VM, then the resxtop tool can be used. The resxtop tool is a remote version of the esxtop tool. Otherwise, the esxtop tool can be used directly on the ESXi host (root access must be enabled). See http://communities.vmware.com/docs/DOC-11812 for details on esxtop.