Real customers setting up GRID in the GTC 2016 hands-on; the following week the SA team tried it out on their colleagues including novices to GRID!

I had some fun at NVIDIA GTC 2016 taking part in a hands-on lab run by the SA (Solution Architecture) organisation of which I am a part. These labs are proving really useful for walking new-users to GRID through key operations on both VMware and Citrix stacks. The guys running it mooted adding more on monitoring once you have got set-up and I kind of volunteered to have a crack at a bonus chapter for the hands-on around monitoring on Citrix.

Having worked at Citrix this is was an easy one, I may well set myself the bigger challenge of getting more familiar with VMware metrics in the future…. if this proves useful and depending on the feedback from you the reader!

I often get asked how many vCPUs/how much RAM etc… a VM should be provisioned with for the most random applications and even when I am familiar with the applications, many are used so differently by different users that it’s hard to say. For example AutoCAD or SolidWorks have a vast range of functions from 2D to 3D to rendering.

However a user can tell for themselves if they may have a problem in the provisioning by developing a little knowledge of the XenServer/XenCenter metrics especially those not on by default! I’ve included some information below that I’m hoping will guide the reader to working out if the number of vCPUs allocated is causing a problem or not…. let’s see how it goes…. and do look out for those hands-on labs at GTC and other NVIDIA events.

Problem

Customers aren’t always aware of all the metrics available on XenServer / within XenCenter. Particularly to help them assess if they have provisioned resources like vCPU and RAM for the VM optimally for the applications they are using.

Solution

This article is to help new users become more familiar with metrics available, how to view them in XenCenter. I’m hoping it can be incorporated into a hands-on lab or user guide, so please add suggestions for improvements.

XenServer Monitoring

Citrix XenServer has a good deal of metrics which can be accessed from a command prompt in the hypervisor or from within the XenCenter management console. Many metrics are off by default to avoid unnecessary system load where they not normally be needed. There is a very detailed guide on which metrics are available, how to configure thresholds for alerts and how to trigger email alerts within Chapter 9 of The XenServer Administrator Guide. Always consult the version of the guide pertaining to the version of XenServer you are using e.g. for XS6.5 – Citrix XenServer® 6.5 Administrator’s Guide.

Proportion of time over the past sample period during which one or more kernels was executing on this GPU

No

A supported GPU is installed on the host

Host

gpu_utilisation_memory_io_<pci-bus-id>

(fraction)

Proportion of time over the past sample period during which global (device) memory was being read or written on this GPU

No

A supported GPU is installed on the host

Note: GPU metrics are available in XenCenter for GPU-passthrough but because of the nature of PCIe pass-through the hypervisor has no access to the actual data (pass-through means only the VM can see/access the GPU) and so these graphs and metrics will be zero (i.e. equal to 0).

If you are trouble-shooting a performance issue it is important that you identify which resource is the bottleneck. Often it may not be the GPU. Metrics that are particularly worth checking include:

Those pertaining to CPU usage on the Host

Class

Name

Description

Condition for existence

XenCenter Name

Host

cpu<cpu>-C<cstate>

Time CPU<cpu> spent in C-state <cstate> in miliseconds.

C-state

exists on CPU

CPU <cpu>C-state<cstate>

Host

cpu<cpu>-P<pstate>

Time CPU <cpu> spent in P-state <pstate> in miliseconds.

P-state

exists on

CPU

CPU <cpu>

P-state

<pstate>

Host

cpu<cpu>

Utilisation of physical CPU <cpu> (fraction). Enabled

by default.

pCPU

<cpu>

exists

CPU <cpu>

Host

cpu_avg

Mean utilisation of physical CPUs (fraction). Enabled

by default.

None

Average

CPU

C-State and P-state information is particularly insightful in the context of bursty (CAD applications often are) applications where peak vs. average usage can vary. Many servers are shipped in power saving mode rather than for maximum performance. This needs to be changed in the BIOS to allow the hypervisor and hence app to use the full range of P/C-States. I wrote a guide to C/P-states a long time ago: http://xenserver.org/partners/developing-products-for-xenserver/19-dev-help/138-xs-dev-perf-turbo.html I’m not sure whether the information is correct with respect to the XenServer commands to optimally configure a system but the monitor instructions should be correct.

Many CAD/3D applications can be highly single-threaded and benefit from using turbo mode. Catia is one such application that has often been like this. P-state (P0) the highest mode is traditionally used to indicate if turbo is in use but you must be very careful if using XenCenter to note the convention that if turbo is in use, P0 will be turbo mode and P1 the highest non-turbo mode. There is a convention of labelling turbo-mode with a frequency +1MHz above normal maximum frequency means that XenCenter does not reflect the true frequency of the turbo mode and as such users may interpret it that turbo mode is not occurring. E.g. on a 3400MHz Intel system, P0 will be logged as 3401MHz, where the maximal non-turbo mode is P1 with 3400MHz.

If you are interested in measuring the vCPU overprovisioning from the point of view of the host, you can use the host’s cpu_avg metric to see if it’s too close to 1.0 (rather than 0.8, i.e. 80%): If you are interested in measuring the vcpu overprovisioning from the point of view of a specific VM, you can use the VM’s runstate_* metrics, especially the ones measuring runnable, which should be less than 0.01 or so. These metrics can be investigated via the command line or XenCenter.

XenServer metrics are stored by a mechanism of RRD (Round Robin Database) which means that data stored is limited by degrading historical data in granularity. E.g. the last 10 minutes of data can be accessed at a sample interval of 5s as collected, older data is sample-binned and so becomes increasingly averaged. This means the graphs in XenCenter will become smoother and data on short-lived events is lost. Each archive in the database samples its particular metric on a specified granularity:

Every 5 seconds for the duration of 10 minutes

Every minute for the past two hours

Every hour for the past week

Every day for the past year

XenCenter contains a very generic interface to metric data, which means that any available metric can be graphed and plotted. Knowing the GPU metrics the guide will show you how to add those GPU metrics into XenCenter graphs.

Exercise: Adding P-state graphs to XenCenter

Find the section “Configuring Performance Graphs” with in the XenServer Administrators Guide and follow the steps:

To Add A New Graph

On the Performance tab, click Actions and then New Graph. The New Graph dialog box will be displayed.

In the Name field, enter a name for the graph.

From the list of Datasources, select the check boxes for the datasources you want to include in the graph, i.e. those with the format CPU<cpu>P-state<pstate>:

Add all available P-states for the first CPU

What C-states are available?

Click Save.

Now view the graph:

Is turbo-boost in use, can you tell? (hover over the graph)

Exercise: Check whether vCPU contention is occurring using XenCenter

Hint: you may need to add a graph for certain runstate_ metric

Hint: you may also need to check a CPU metric, which one?

Checking your GPU configuration

The XenServer CLI (Command Line Interface) offers many commands to probe your XenServer environment. Again these are documented in the Administrators guide but in an Appendix sub-section titled “GPU Commands”. The CLI has good, if esoteric, tab completion.

Exercise: Check what vGPU types are used on each pGPU (physical GPU) in the system)

Use

xe pgpu-list

to get a list of the pGPUs use the output from this as input to the xe command:

4 thoughts on “Monitoring NVIDIA vGPU for Citrix XenServer including with XenCenter”

As with all your articles this is all very insightful and reflects my findings in the field spot on!

A fun fact about cad cam is that the GPU is far less important then one would think. Most applications benefit more from high cpu clock speeds then a bigger nvidia grid profile.

One thing I’d like to add is that although the turbo mode is excellent one should not size on turbo mode because it’s not guaranteed to kick in, the way I advise it is to size properly and have turbo mode as a sort of icing on the cake.

It’s great that xenserver offers these metrics to be able to troubleshoot or assess performance. I always start with Lakeside Systrack to make sure I advise my clients the proper CPUs.

One thing I also learned is that IOPS are very important for engineers. Autodesk for example benefits enormously from fast io access and low latency storage. The faster the better so I prefer Atlantis over SSD

IOPS is indeed critical. With a SDS solution, things are massively better and we get around 90% cache read rates for our XenDesktop VMs, which is fantastic. Interestingly, I rarely see turbo mode kicking in much, even though in recent releases of XenServer it’s the recommended setting. Just got a couple of Dell 730 servers with the new Broadwell v4 Xeon CPUs so will see how they perform.
As to GPU vs. CPU, a lot depends. I often see four CPUs kicking in for a GPU passthrough session, so in this case, the CPU is still getting a lot of the load.
Assuming the GPU is going to take on the brunt of the work is indeed erroneous, so you’re spot on with that point, Barry. And even with vGPU, the load is split in many cases — you can’t have a gone one without the other.

Great and thorough article, Rachel! Anyone that is about to build their GRID-enabled XenServer should pay close attention, and also read the previous articles too. It’s important to get your BIOS CPU and power settings right for GRID.

When I did an investigation into XenServer vCPU contention, I focused on the following metrics:

As you’ve noted above, the “vCPU Concurrency Hazard” metric is defined as the “fraction of time that some VCPUs are running and some are runnable.” If you think about this definition, 1 core could be runnable or 10 cores could be runnable. Both fit the definition of “some” cores. I hope that this metric will become more accurate in the future to understand what percentage of cores are runnable.

I’d also like to see GPU contention metrics in the future to monitor the time-slicing of the cores.

Richard,
The time-slice metric would indeed be interesting, but you’d of course have to do some sort of average as this happens so fast that XenCenter can only update I believe every 5 seconds. But nevertheless, this would be very useful to see how things are time-sliced at least trend-wise or if a GPU is over-loaded.
-=Tobias