Automating VNX Storage Processor Percent Utilization Alerts

Note: The original post describes a method that requires EMC Control Center and Performance Manager. That tool has been deprecated by EMC in favor of ViPR SRM. There is still a method you can use to gather CPU information for use in bash scripts. I don’t have script examples that use this command, but if anyone needs help send me a comment and I’ll help. The Navisphere CLI command to get busy/idle ticks for the Storage processors is naviseccli -h getcontrol -cbt.

The output looks like this:

Controller busy ticks: 1639432
Controller idle ticks: 1773844

The SP utilization statistics outputted are an average of the utilization across all the cores of the SP’s processors since the last reset. To get the actual point-in-time SP CPU utilization from this output requires a calculation. You need to poll twice, create a delta for the individual counters by subtracting the earlier value from the later, and apply this formula:

Utilization = Busy Ticks / (Busy Ticks + Idle Ticks)

What follows is the original method I posted that requries EMC Control Center.

I was tasked with coming up with a way to get email alerts whenever our SP utilization breaks a certain threshold. Since none of the monitoring tools that we own will do that right now, I had to come up with a way using custom scripts. This is my 2nd post on the same subject, I removed my post from yesterday as it didn’t work as I intended. This time I used EMC’s Performance Manager rather than pulling data from the SP with the Navisphere CLI.

First, I’m running all of my bash scripts on a windows sever using cygwin. These should run fine on any linux box as well, however. Because I don’t have a native sendmail configuration set up on the windows server, I’m using the control station on the Celerra to actually do the comparison of the utilization numbers in the text files and then email out an alert. The Celerra control station automatically pulls the file via FTP from the windows server every 30 minutes and sends out an email alert if the numbers cross the threshold. A description of each script and the schedule is below.

Windows Server:

Export.cmd:

This first windows batch script runs an export (with pmcli) from EMC Performance Manager that does a dump of all the performance stats for the current day.

This cygwin/bash script manipulates the file export from above and ultimately creates two single text files (one for SPA and one for SPB) with a single numerical value of the most recent SP Utilization. There are a few extra steps at the beginning of the script that are irrelevant to the SP utilization, they’re there for other purposes.

#This will pull only the timestamp line from the topgrep -m 1 "/" /home/scripts/sputil/0999_interval.csv > /home/scripts/sputil/timestamp.csv# This will pull out only the "disk utilization" line.grep -i "^% Utilization" /home/scripts/sputil/0999_interval.csv >> /home/scripts/sputil/stats.csv# This will pull out the disk/LUN title info for the first columngrep -i "Data Collected for DiskStats -" /home/scripts/sputil/0999_interval.csv > /home/scripts/sputil/diskstats.csvgrep -i "Data Collected for LUNStats -" /home/scripts/sputil/0999_interval.csv > /home/scripts/sputil/lunstats.csv# This will create a column with the disk/LUN numbercat /home/scripts/sputil/diskstats.csv /home/scripts/sputil/lunstats.csv > /home/scripts/sputil/data.csv# This combines the disk/LUN column with the data columnpaste /home/scripts/sputil/data.csv /home/scripts/sputil/stats.csv > /home/scripts/sputil/combined.csvcp /home/scripts/sputil/combined.csv /home/scripts/sputil/utilstats.csv# This removes all the temporary filesrm /home/scripts/sputil/timestamp.csvrm /home/scripts/sputil/stats.csvrm /home/scripts/sputil/diskstats.csvrm /home/scripts/sputil/lunstats.csvrm /home/scripts/sputil/data.csvrm /home/scripts/sputil/combined.csv# This next line strips the file of all but the last two rows, which are SP Utilization.# The 1 looks at the first character in the row, the D specifies "starts with D", then deletes rows meeting those conditions.awk -v FS="" -v OFS="" '$1 != "D"' < /home/scripts/sputil/utilstats.csv > /home/scripts/sputil/sputil.csv#This pulls the values from the last column, which would be the most recent.awk -F, '{print $(NF-1)}' < /home/scripts/sputil/sputil.csv > /home/scripts/sputil/sp_util.csv#pull 1st line (SPA) into separate filesed -n 1,1p < /home/scripts/sputil/sp_util.csv > /home/scripts/sputil/spAutil.txt#pull 2nd line (SPB) into separate filesed -n 2,2p < /home/scripts/sputil/sp_util.csv > /home/scripts/sputil/spButil.txt#The spAutil.txt/spButil.txt files now contain only a single numerical value, which would be the most recent %utilization from the Control Center/Performance Manager dump file.#Copy files to web server root directorycp /home/scripts/sputil/*.txt /cygdrive/c/inetpub/wwwroot

Celerra Control Station:

CelerraArray:/home/nasadmin/sputil/ftpsp.sh

The script below connects to the windows server and grabs the current SP utilization text files via FTP every 30 minutes (via a cron job).

This script does the comparison check to see if the SP utilization is over our threshold. If it is, it sends an email alert that includes the %Utilization number in the subject line of the email. To change the threshold setting, you’d need to change the THRESHOLD=<XX> line in the script. The line containing printf “%2.0f” converts the floating point value to an integer, as bash scripts don’t recognize floating point values.

The FTP script is currently set to pull SP utilization files. Run “crontab –e” to edit the scheduler. I’ve got the alert script set to run at the top of the hour and half past the hour, and the updated SP files from the web server are FTP’d in a few minutes prior.

I have been reading your post and its very helpful for every one who want to understand storage and SAN technology.

Keep doing good work….

I need some help to put naviseccli -h getcontrol -cbt.as a script to fetch and SP cpu utilization in vnx storage array.

My requirement is to fetch cpu utilization for a particular SP and compare it with some threshold value (85%). if the cpu utilization goes beyond the threshold, it should send some alert email to administrator about utilization status.

Thanks Anruag. I posted that years ago and the original naviseccli script I wrote was for the prior organization I worked for, I no longer have it. If I get some time in the coming weeks I can revisit this post and write & share a script that uses naviseccli rather than Control Center.