Re: TUAM and VMware's CPU seconds vs Mhz metrics

‏2009-09-02T15:24:29Z

This is the accepted answer.
This is the accepted answer.

Question: How to use TUAM's VMware collector to determine how much resources have been used? The client is not interested in how much CPU capacity or how many processors have been allocated to a VM guest, but rather how much CPU time, memory, etc. have been used over a given time period.

TUAM's VMware collector gathers several types of CPU and memory usage information, by querying VMware's underlying Virtual Center web services API. These metrics include CPU usage, measured as a cummulative number of seconds over a given interval. These cummulative CPU seconds are ideal for resource accounting in the VMware environment, as they directly show how much CPU resources virtual guests consume. In any given interval, the sum of all the virtual guests' CPU consumption will be equal to the total CPU consumed on the underlying host hardware (less allowance for hypervisor overhead and so forth). Over a larger time period (such as an accounting month), the cummulative CPU consumed may be used in a resource accounting process to allocate the costs of the VMware host and environment.

Other metrics such as the number of processors allocated to a virtual guest may not accurately show how much CPU has been consumed, as the processors allocated metric would only provide the maximum amount of CPU that could be used in a given interval. Also, VMware environments may dynamically increase or decrease the allocated processors to a virtual guest, further complicating the data collection process. Reporting and determining a charge would be made more difficult, as the allocation metric must be scaled by the time factor (for instance, 2 CPUs allocated for one hour, then 1 CPU for 15 minutes, then 3 CPUs for two hours, etc.).

VMware's CPU seconds consumed metric solves that problem -- by providing the actual CPU seconds consumed, not an allocation metric. Each virtual guest's consumption is maintained in Virtual Center's statistics database over time, so TUAM can query those statistics for inclusion into a resource accounting or chargeback methodology.

A similar discussion could be made around memory, network, and disk I/O metrics -- TUAM collects from VMware statistics on actual consumption, additive over time. It is these cummulative metrics that enable a resource (or usage-based) accounting methodology, as opposed to a configuration-based methodology.

Question: How to convert VMware's "CPU seconds consumed" metric to a "Megahertz" metric? The client has existing contractual requirements to report CPU consumption in MHz, not CPU seconds/minutes/hours, etc.

There is no definitive relationship between a "CPU second" and "megahertz" -- the former is a metric showing how long a task occupied a CPU (therefore making that scarce resource unavailble for other requestors) whereas the latter is an indicator of "processor speed" or how many instructions (i.e., how much work) a single processor can accomplish in a period of time. To relate the two requires that the underlying time dimensions be equated.

The difficulty may be shown by a simple example. Assume a single CPU system running at 100Mhz -- if a task were to consume 100% of that CPU for one hour, it would consume "100Mhz" for that hour. What if that task were to consume 100% for two hours -- what is the appropriate Mhz metric? One might say "100Mhz" for two hours -- probably not appropriate to say "200Mhz". Now what if that task consumed 25% of the single CPU for a four hour window -- is the metric "25Mhz for four hours" or "100Mhz over one hour" -- and so forth.

CPU seconds/minutes/hours address that difficulty, since they have an inherent time dimension built in; the time factor in a CPU second is "1 second".

To relate CPU seconds to Mhz, you have to decide over what time interval your Mhz spans. Let's say you perform resource billing on a monthly basis. Then you'd want to collect CPU seconds, aggregating them for the month. The total possible CPU seconds would be 60 seconds/min * 60 seconds/hr * 24 hours/day * 31 days/mo (for August) = 2,678,400 seconds. If a given VMware guest consumes say 500,000 CPU seconds in that month, then they would have consumed about 18.7% of the theoretical maximum CPU seconds for the month. One might then say that virtual guest also consumed 18.6% of the Megahertz for that month. If this were a 3.2 Ghz processor, then that guest would have consumed 18.7% * 3.2 Ghz = 0.59 Ghz (or 611.7 Mhz, using a scaling factor of 1024).

Note that the same calculation may be done for other time intervals -- you simply calculate the total possible CPU seconds based on the desired time interval. Note also that you shouldn't calculate Mhz used for the interim time periods (i.e., shouldn't do this during daily TUAM processing but only at month end) as those Mhz metrics are not cummulative -- they must be calculated based on the total CPU consumed / total CPU possible * Mhz rating.

Question: When is it appropriate to use VMware's Virtual Center data to measure a VM guest's resource usage, vs. using an OS or sub-system specific collector (such as Windows process accounting or IIS logging, etc.)?

TUAM's VMware collector shows usage statistics for the virtual guest as a single entity -- there is no additional detail provided regarding what type applications may be running in the virtual guest. If the guest is "owned" or dedicated to a single client or application, then these VMware usage metrics may be appropriate for accounting for that guest.

However, a single virtual guest may not be dedicated to a single client. For instance, a database server may be running the single application SQL Server, yet host multiple databases; or a web server may run the single application IIS, yet host multiple websites. In these type cases, it's often necessary to obtain a different level of usage statistics to accurately assign the virtual guest's costs to the responsible client.

TUAM also offers a variety of collectors which provide that additional level of detail. TUAM's Microsoft IIS collector for instance gathers usage metrics such as bytes sent/received -- these metrics may be used in to split the costs of the IIS server across multiple clients whose websites are hosted on the common IIS server. Similarly, TUAM's SQL Server (or other database) collector gathers usage metrics such as CPU time, reads, writes, etc. -- which also may be used to split the shared SQL Server costs across multiple clients' databases.

In this case, some TUAM clients directly calculate a rate for IIS or SQL resources, by building a cost pool containing only those costs. The TUAM invoice would then report for instance, number of IIS bytes sent at a given rate. Other TUAM clients however might use the IIS metrics to prorate the underlying virtual guest's CPU usage, so the invoice would report VMware CPU time consumed, but by virtue of an IIS proration based on actual IIS usage metrics.

Re: TUAM and VMware's CPU seconds vs Mhz metrics

Question: How to use TUAM's VMware collector to determine how much resources have been used? The client is not interested in how much CPU capacity or how many processors have been allocated to a VM guest, but rather how much CPU time, memory, etc. have been used over a given time period.

TUAM's VMware collector gathers several types of CPU and memory usage information, by querying VMware's underlying Virtual Center web services API. These metrics include CPU usage, measured as a cummulative number of seconds over a given interval. These cummulative CPU seconds are ideal for resource accounting in the VMware environment, as they directly show how much CPU resources virtual guests consume. In any given interval, the sum of all the virtual guests' CPU consumption will be equal to the total CPU consumed on the underlying host hardware (less allowance for hypervisor overhead and so forth). Over a larger time period (such as an accounting month), the cummulative CPU consumed may be used in a resource accounting process to allocate the costs of the VMware host and environment.

Other metrics such as the number of processors allocated to a virtual guest may not accurately show how much CPU has been consumed, as the processors allocated metric would only provide the maximum amount of CPU that could be used in a given interval. Also, VMware environments may dynamically increase or decrease the allocated processors to a virtual guest, further complicating the data collection process. Reporting and determining a charge would be made more difficult, as the allocation metric must be scaled by the time factor (for instance, 2 CPUs allocated for one hour, then 1 CPU for 15 minutes, then 3 CPUs for two hours, etc.).

VMware's CPU seconds consumed metric solves that problem -- by providing the actual CPU seconds consumed, not an allocation metric. Each virtual guest's consumption is maintained in Virtual Center's statistics database over time, so TUAM can query those statistics for inclusion into a resource accounting or chargeback methodology.

A similar discussion could be made around memory, network, and disk I/O metrics -- TUAM collects from VMware statistics on actual consumption, additive over time. It is these cummulative metrics that enable a resource (or usage-based) accounting methodology, as opposed to a configuration-based methodology.

Question: How to convert VMware's "CPU seconds consumed" metric to a "Megahertz" metric? The client has existing contractual requirements to report CPU consumption in MHz, not CPU seconds/minutes/hours, etc.

There is no definitive relationship between a "CPU second" and "megahertz" -- the former is a metric showing how long a task occupied a CPU (therefore making that scarce resource unavailble for other requestors) whereas the latter is an indicator of "processor speed" or how many instructions (i.e., how much work) a single processor can accomplish in a period of time. To relate the two requires that the underlying time dimensions be equated.

The difficulty may be shown by a simple example. Assume a single CPU system running at 100Mhz -- if a task were to consume 100% of that CPU for one hour, it would consume "100Mhz" for that hour. What if that task were to consume 100% for two hours -- what is the appropriate Mhz metric? One might say "100Mhz" for two hours -- probably not appropriate to say "200Mhz". Now what if that task consumed 25% of the single CPU for a four hour window -- is the metric "25Mhz for four hours" or "100Mhz over one hour" -- and so forth.

CPU seconds/minutes/hours address that difficulty, since they have an inherent time dimension built in; the time factor in a CPU second is "1 second".

To relate CPU seconds to Mhz, you have to decide over what time interval your Mhz spans. Let's say you perform resource billing on a monthly basis. Then you'd want to collect CPU seconds, aggregating them for the month. The total possible CPU seconds would be 60 seconds/min * 60 seconds/hr * 24 hours/day * 31 days/mo (for August) = 2,678,400 seconds. If a given VMware guest consumes say 500,000 CPU seconds in that month, then they would have consumed about 18.7% of the theoretical maximum CPU seconds for the month. One might then say that virtual guest also consumed 18.6% of the Megahertz for that month. If this were a 3.2 Ghz processor, then that guest would have consumed 18.7% * 3.2 Ghz = 0.59 Ghz (or 611.7 Mhz, using a scaling factor of 1024).

Note that the same calculation may be done for other time intervals -- you simply calculate the total possible CPU seconds based on the desired time interval. Note also that you shouldn't calculate Mhz used for the interim time periods (i.e., shouldn't do this during daily TUAM processing but only at month end) as those Mhz metrics are not cummulative -- they must be calculated based on the total CPU consumed / total CPU possible * Mhz rating.

Question: When is it appropriate to use VMware's Virtual Center data to measure a VM guest's resource usage, vs. using an OS or sub-system specific collector (such as Windows process accounting or IIS logging, etc.)?

TUAM's VMware collector shows usage statistics for the virtual guest as a single entity -- there is no additional detail provided regarding what type applications may be running in the virtual guest. If the guest is "owned" or dedicated to a single client or application, then these VMware usage metrics may be appropriate for accounting for that guest.

However, a single virtual guest may not be dedicated to a single client. For instance, a database server may be running the single application SQL Server, yet host multiple databases; or a web server may run the single application IIS, yet host multiple websites. In these type cases, it's often necessary to obtain a different level of usage statistics to accurately assign the virtual guest's costs to the responsible client.

TUAM also offers a variety of collectors which provide that additional level of detail. TUAM's Microsoft IIS collector for instance gathers usage metrics such as bytes sent/received -- these metrics may be used in to split the costs of the IIS server across multiple clients whose websites are hosted on the common IIS server. Similarly, TUAM's SQL Server (or other database) collector gathers usage metrics such as CPU time, reads, writes, etc. -- which also may be used to split the shared SQL Server costs across multiple clients' databases.

In this case, some TUAM clients directly calculate a rate for IIS or SQL resources, by building a cost pool containing only those costs. The TUAM invoice would then report for instance, number of IIS bytes sent at a given rate. Other TUAM clients however might use the IIS metrics to prorate the underlying virtual guest's CPU usage, so the invoice would report VMware CPU time consumed, but by virtue of an IIS proration based on actual IIS usage metrics.

Is there an explaination of the VMWare CPU and Memory collected resource values beyond the product documenation. A worked through example of the values using a provided CurrentCSR.txt file for the VMWare collection? I have a VMCPUPCT which is supposed to be a percentage but it is instead a large integer value. I need to provide below but do not know how to manipulate the data I have to provide the request.

Avg Monthly Usage - CPU: CPU usage should be reflected as a percent of the processor's full capacity
Avg Monthly Usage - Memory: This may need to be broken down by virtual memory and physical memory
Avg Monthly Usage - Disk: This could be the peak usage in order to determine if the peak usage exceeds the purchased disk space.

Re: TUAM and VMware's CPU seconds vs Mhz metrics

Is there an explaination of the VMWare CPU and Memory collected resource values beyond the product documenation. A worked through example of the values using a provided CurrentCSR.txt file for the VMWare collection? I have a VMCPUPCT which is supposed to be a percentage but it is instead a large integer value. I need to provide below but do not know how to manipulate the data I have to provide the request.

Avg Monthly Usage - CPU: CPU usage should be reflected as a percent of the processor's full capacity
Avg Monthly Usage - Memory: This may need to be broken down by virtual memory and physical memory
Avg Monthly Usage - Disk: This could be the peak usage in order to determine if the peak usage exceeds the purchased disk space.

I think your question regarding "percent used over a month" is quite similar to the second question I posted above (i.e., converting CPU seconds to Megahertz). To get "percent used over a period" you really need "amount consumed" and "amount available" for that period, then do a simple division.

TUAM is geared primarily to accumulating resource metrics; that's why you'll see bigger and bigger numbers in reports as a month's data are loaded. However, some metrics retrieved from the VMware (and other) collectors are not truly usage metrics, but rather are pre-calculated statistics, valid for the particular collection interval. VMCPUPCT is one of those calculated values, and it's not appropriate to add those values over time (TUAM's doc at http://publib.boulder.ibm.com/infocenter/tivihelp/v3r1/topic/com.ibm.ituam.doc_7.1.3/admin_win_dc/c_identifiers_and_resources_collected_by_the_vmware_collector.htm suggests you might not want to keep VMCPUPCT in the regular Summary/Detail files, but rather relegate it to the Resource table). Since you're probably gathering VMware metrics at <n> minute intervals, it's not suprising your VMCPUPCT values for the day are simply large integer values.

You could use TUAM to accumulate the "consumed" and "available" metrics for the month, then use some technique to calculate the percentage. I can see at least two options:

1) Use Integrator's Database collector to extract at month-end the relevant consumed/available metrics from the standard CIMSSummary table; compute the desired statistics in Integrator stages, or as part of the custom database query; write metrics to a once-a-month CSR record and process into the database. Note that this new, custom calculated statistic will have the same "you shouldn't accumulate this metric" status as VMCPUPCT.

2) Write a customized report that selects the relevant metrics, doing the calculations on the fly.

Be aware also that any such "percent used" metric depends on which level of the Account Code you use for the base metrics. That is, L1 percentages obviously should be thought of as the "sum" of L2 percentages (but aren't actually calculated that way). That's an argument for doing the calculations at report time, so you don't have the percentages maintained at all levels.

Of course, your particular situation may dictate that you need a "percent use" metric which is retained in the database as part of the Summary table, as an auditable, billing, or chargeback metric. That's an argument for option 1.

Re: TUAM and VMware's CPU seconds vs Mhz metrics

Question: How to use TUAM's VMware collector to determine how much resources have been used? The client is not interested in how much CPU capacity or how many processors have been allocated to a VM guest, but rather how much CPU time, memory, etc. have been used over a given time period.

TUAM's VMware collector gathers several types of CPU and memory usage information, by querying VMware's underlying Virtual Center web services API. These metrics include CPU usage, measured as a cummulative number of seconds over a given interval. These cummulative CPU seconds are ideal for resource accounting in the VMware environment, as they directly show how much CPU resources virtual guests consume. In any given interval, the sum of all the virtual guests' CPU consumption will be equal to the total CPU consumed on the underlying host hardware (less allowance for hypervisor overhead and so forth). Over a larger time period (such as an accounting month), the cummulative CPU consumed may be used in a resource accounting process to allocate the costs of the VMware host and environment.

Other metrics such as the number of processors allocated to a virtual guest may not accurately show how much CPU has been consumed, as the processors allocated metric would only provide the maximum amount of CPU that could be used in a given interval. Also, VMware environments may dynamically increase or decrease the allocated processors to a virtual guest, further complicating the data collection process. Reporting and determining a charge would be made more difficult, as the allocation metric must be scaled by the time factor (for instance, 2 CPUs allocated for one hour, then 1 CPU for 15 minutes, then 3 CPUs for two hours, etc.).

VMware's CPU seconds consumed metric solves that problem -- by providing the actual CPU seconds consumed, not an allocation metric. Each virtual guest's consumption is maintained in Virtual Center's statistics database over time, so TUAM can query those statistics for inclusion into a resource accounting or chargeback methodology.

A similar discussion could be made around memory, network, and disk I/O metrics -- TUAM collects from VMware statistics on actual consumption, additive over time. It is these cummulative metrics that enable a resource (or usage-based) accounting methodology, as opposed to a configuration-based methodology.

Question: How to convert VMware's "CPU seconds consumed" metric to a "Megahertz" metric? The client has existing contractual requirements to report CPU consumption in MHz, not CPU seconds/minutes/hours, etc.

There is no definitive relationship between a "CPU second" and "megahertz" -- the former is a metric showing how long a task occupied a CPU (therefore making that scarce resource unavailble for other requestors) whereas the latter is an indicator of "processor speed" or how many instructions (i.e., how much work) a single processor can accomplish in a period of time. To relate the two requires that the underlying time dimensions be equated.

The difficulty may be shown by a simple example. Assume a single CPU system running at 100Mhz -- if a task were to consume 100% of that CPU for one hour, it would consume "100Mhz" for that hour. What if that task were to consume 100% for two hours -- what is the appropriate Mhz metric? One might say "100Mhz" for two hours -- probably not appropriate to say "200Mhz". Now what if that task consumed 25% of the single CPU for a four hour window -- is the metric "25Mhz for four hours" or "100Mhz over one hour" -- and so forth.

CPU seconds/minutes/hours address that difficulty, since they have an inherent time dimension built in; the time factor in a CPU second is "1 second".

To relate CPU seconds to Mhz, you have to decide over what time interval your Mhz spans. Let's say you perform resource billing on a monthly basis. Then you'd want to collect CPU seconds, aggregating them for the month. The total possible CPU seconds would be 60 seconds/min * 60 seconds/hr * 24 hours/day * 31 days/mo (for August) = 2,678,400 seconds. If a given VMware guest consumes say 500,000 CPU seconds in that month, then they would have consumed about 18.7% of the theoretical maximum CPU seconds for the month. One might then say that virtual guest also consumed 18.6% of the Megahertz for that month. If this were a 3.2 Ghz processor, then that guest would have consumed 18.7% * 3.2 Ghz = 0.59 Ghz (or 611.7 Mhz, using a scaling factor of 1024).

Note that the same calculation may be done for other time intervals -- you simply calculate the total possible CPU seconds based on the desired time interval. Note also that you shouldn't calculate Mhz used for the interim time periods (i.e., shouldn't do this during daily TUAM processing but only at month end) as those Mhz metrics are not cummulative -- they must be calculated based on the total CPU consumed / total CPU possible * Mhz rating.

Question: When is it appropriate to use VMware's Virtual Center data to measure a VM guest's resource usage, vs. using an OS or sub-system specific collector (such as Windows process accounting or IIS logging, etc.)?

TUAM's VMware collector shows usage statistics for the virtual guest as a single entity -- there is no additional detail provided regarding what type applications may be running in the virtual guest. If the guest is "owned" or dedicated to a single client or application, then these VMware usage metrics may be appropriate for accounting for that guest.

However, a single virtual guest may not be dedicated to a single client. For instance, a database server may be running the single application SQL Server, yet host multiple databases; or a web server may run the single application IIS, yet host multiple websites. In these type cases, it's often necessary to obtain a different level of usage statistics to accurately assign the virtual guest's costs to the responsible client.

TUAM also offers a variety of collectors which provide that additional level of detail. TUAM's Microsoft IIS collector for instance gathers usage metrics such as bytes sent/received -- these metrics may be used in to split the costs of the IIS server across multiple clients whose websites are hosted on the common IIS server. Similarly, TUAM's SQL Server (or other database) collector gathers usage metrics such as CPU time, reads, writes, etc. -- which also may be used to split the shared SQL Server costs across multiple clients' databases.

In this case, some TUAM clients directly calculate a rate for IIS or SQL resources, by building a cost pool containing only those costs. The TUAM invoice would then report for instance, number of IIS bytes sent at a given rate. Other TUAM clients however might use the IIS metrics to prorate the underlying virtual guest's CPU usage, so the invoice would report VMware CPU time consumed, but by virtue of an IIS proration based on actual IIS usage metrics.

Hello Scott!
thanks for this very interesting. I'm trying to do something similar, I need to convert cpu secs in AIX into MHz to calculate burst capacity.
the following formula you use
CPU consumed / total CPU possible * Mhz rating
makes sense to me in your example with one CPU, however Im not too sure how to use it in the following scenario.
I want to calculate the burst capacity on an hourly basis using the formula above. I do have CPU secs from the AIX advance accounting collector, however Im not sure what to do with "total CPU possible" and "Mhz rating". In my understanding for an uncapped partition an LPAR can use any core (if available) above the entitlement up to virtual processors available. In this case would it be enough to multiply the "total CPU possible" and "MHz rating" by the number of virtual processors?
Also another question I would have is, can I use the vmstat metric VSUCPUPT (Percent User CPU)? would that give me the same result as (CPU consumed / total CPU possible)?
Thanks!

Re: TUAM and VMware's CPU seconds vs Mhz metrics

Hello Scott!
thanks for this very interesting. I'm trying to do something similar, I need to convert cpu secs in AIX into MHz to calculate burst capacity.
the following formula you use
CPU consumed / total CPU possible * Mhz rating
makes sense to me in your example with one CPU, however Im not too sure how to use it in the following scenario.
I want to calculate the burst capacity on an hourly basis using the formula above. I do have CPU secs from the AIX advance accounting collector, however Im not sure what to do with "total CPU possible" and "Mhz rating". In my understanding for an uncapped partition an LPAR can use any core (if available) above the entitlement up to virtual processors available. In this case would it be enough to multiply the "total CPU possible" and "MHz rating" by the number of virtual processors?
Also another question I would have is, can I use the vmstat metric VSUCPUPT (Percent User CPU)? would that give me the same result as (CPU consumed / total CPU possible)?
Thanks!

Hi again Scott,
just noticed that in 7.3 there is the new HMC collctor. This collects capped and uncapped cycles. from which it should be fairly straightforward to calculate capacity in term of MHz. thanks

Re: TUAM and VMware's CPU seconds vs Mhz metrics

Hi again Scott,
just noticed that in 7.3 there is the new HMC collctor. This collects capped and uncapped cycles. from which it should be fairly straightforward to calculate capacity in term of MHz. thanks

You beat me to the punch! I was going to suggest TUAM's PowerVM HMC collector if you're working with virtual images on Power hardware. You're able to gather usage metrics from the frame's managing HMC, encompassing all virtual LPARs at once. Since the HMC gathers the metrics from the frame's hypervisor, it provides information about any guest OS (whether AIX, Linux, or other); much simpler than using sar, vmstat, or even AIX Advanced Accounting from within each LPAR.

The latest collector version provides capped and uncapped cycles and CPU hours for each LPAR, along with frame-wide metrics for configurable/available processing units and memory (very RMF type 70-like information, if you're familiar with PR/SM LPAR metrics from the z/OS environment).

Be aware you still have to deal with the underlying time interval when you determine Mhz, just like you'd do with VMware's metrics (or if trying to calculate MIPS for a z box).

Re: TUAM and VMware's CPU seconds vs Mhz metrics

You beat me to the punch! I was going to suggest TUAM's PowerVM HMC collector if you're working with virtual images on Power hardware. You're able to gather usage metrics from the frame's managing HMC, encompassing all virtual LPARs at once. Since the HMC gathers the metrics from the frame's hypervisor, it provides information about any guest OS (whether AIX, Linux, or other); much simpler than using sar, vmstat, or even AIX Advanced Accounting from within each LPAR.

The latest collector version provides capped and uncapped cycles and CPU hours for each LPAR, along with frame-wide metrics for configurable/available processing units and memory (very RMF type 70-like information, if you're familiar with PR/SM LPAR metrics from the z/OS environment).

Be aware you still have to deal with the underlying time interval when you determine Mhz, just like you'd do with VMware's metrics (or if trying to calculate MIPS for a z box).