cDOT Performance Monitoring Using PowerShell

Performance monitoring is a complex topic, but it’s something that is vital to the successful implementation and maintenance of any system. In the past I’ve had severalposts about using Perl for gathering performance statistics from a 7-mode system (using ONTAP 7.3.x, which is quite old at this point), so I thought it might be a good time for an update.

I originally documented some of this information in a response on the NetApp Community site. This post expands on that a bit and documents it externally.

The NetApp PowerShell Toolkit has three cmdlets which we can use to determine what objects, counters, and instances are available, and a fourth cmdlet to actually collect the data.

Finding the Right Performance Object

Performance reporting in the clustered Data ONTAP API is broken out by two things: Object and Counter. In order to monitor something, for example aggregate performance, we need to find the object which pertains to that “something”. We do this using the Get-NcPerfObject cmdlet.

Throughout the rest of this post I will be using the example of aggregate monitoring, specifically how many reads and writes are being done against an aggregate.

1

2

3

4

5

6

7

8

9

10

11

12

PSC:\&gt;Get-NcPerfObject

Name PrivilegeLevel

------------------

affinity diag

affiperclass diag

affiperqid diag

affitotal diag

aggregate admin

...

...

...

For my cDOT 8.3 cluster this returned 358 items, which is a lot of different categories of monitoring! For many things we can help reduce the ones to consider by using the PrivilegeLevel. The most commonly monitored things are going to be at either admin or advanced privilege level, whereas diag is used for very detailed, infrequently needed, counters. To view non-diag objects, we change the command slightly.

1

Get-NcPerfObject|?{$_.PrivilegeLevel-ne"diag"}

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

PSC:\Users\Andrew&gt;Get-NcPerfObject|?{$_.PrivilegeLevel-ne"diag"}

Name PrivilegeLevel

------------------

aggregate admin

audit_ng admin

audit_ng:vserver admin

cifs admin

cifs:node admin

cifs:vserver admin

client admin

client:vserver admin

cluster_peer admin

cpx admin

cpx_op advanced

disk admin

disk:constituent admin

disk:raid_group admin

ext_cache admin

ext_cache_obj admin

This results in just 113 objects returned, a much shorter list to consider. This privilege level also indicates how much permission on the cluster the user collecting the information will need. A user with diag privileges is going to have considerably more permission on the cluster than one with only admin or advanced.

Finding the Counters

Now that we know what objects are available they give us a categorical view of what’s available. To find out what counters are being collected for each one we use the Get-NcPerfCounter cmdlet. Using the aggregate object as an example, we see the following:

Notice that, once again, I removed the counters which are at the diag level. You may want to look at them, but for the most part they are things that only infrequently need to be monitored because they are very low level details.

I included the properties field because it’s very important…it tells us how to read the counter. From the API documentation:

raw: single counter value is used

delta: change in counter value between two samples is used

rate: delta divided by the time in seconds between samples is used

average: delta divided by the delta of a base counter is used

percent: 100*average is used

Looking at the descriptions, it appears that we want to look at the user_reads, user_writes, and total_transfers counters to determine how much activity is happening on our aggregate. Each of these is a rate counter, which means we need to measure it once, wait some known amount of time (e.g. 5 seconds), then measure again and divide by the number of seconds.

Instances of the Object

Now that we know the objects and counters, and we’ve determined what we want to monitor, we need to find the instances. To do that we use the Get-NcPerfInstance cmdlet.

Giving us an easy to read, per second, output of the number of reads, writes, and total transfers for our aggregate…

1

2

3

4

5

6

7

8

user_reads user_writes total_transfers

------------------------------------

1020102

000

101

000

72689

14058

Performance Monitoring is Fun!

This has been just a short introduction to performance monitoring of a cDOT system using the PowerShell Toolkit. There is a huge number of things that can be monitored, and you can choose to display the information however you like…maybe a real-time report of performance for troubleshooting, intermittent collection to go into a summary report, collection at regular intervals to feed into a trend analysis tool.

Please reach out to me using the comments below or the NetApp Community site with any questions about how to collect performance information from your systems.

5 thoughts on “cDOT Performance Monitoring Using PowerShell”

Hey Andrew, great writeup! I’m “Magyk” on the Netapp support forum, the guy who posed the original question. I was talking to the guys at the local Netapp office and they said they knew you and that you were a good guy. I had a quesstion:

For the -name parameter you’re using “aggregates” as an example. What other options are available for that parameter, or better yet, is there a ways to get a list of options?

Thanks for reading! I hope that this response, and the one in the communities, has been helpful.

The “-Name” parameter comes from the performance object. Use “Get-NcPerfObject” to view a list…there is 358 returned from my cDOT 8.3 system, so it’s quite a few to sort through. To make it a bit easier, show the description property:

Get-NcPerfObject | Select Name,PrivilegeLevel,Description

You can also view them from the ClusterShell:

set -privilege advanced -confirmations off
statistics catalog object show

Remember that the user you are connected to the cluster with must have permissions to the object, and just like ClusterShell there are three privilege levels: admin, advanced, and diag.