A wee while ago I posted about some monitoring scripts that I wrote which also provide trending – but I was pretty slack and never got around to posting them, here they are. 🙂

This is a little light on the details but it should at least provide people with some ideas on how to achieve some fairly comprehensive monitoring for free. There is a lot that can be done to automate the deployment of this – but it’s a project for another day and there is a lot in this post as it is.

Note, I’ve only used these scripts on M3 13.2 and 13.3

If you do intend to use these scripts and modify or enhance them, then I ask that you provide the enhancements so everyone can benefit. I’ll be looking at setting up a github page to make it easier.

We have a combination of Perl and Powershell scripts and for convenience I’ve created a Linux VM which has an ftp server, rrdtool and the Perl packages that the scripts use. SUSE provide some neat tools which allows you to build a JeOS low foot print Linux distro.

monitorM3.pl

This script is the workhorse – it will read from the grid and from the monitor port of the M3 Business Engine and log most of the counters in to rrdtools archive. It will also check for excessive changes in the interactive jobs (to flag looping interactive jobs) and email if it continues over a predefined number of checks. Equally, it will look for excessive CPU usage over a number of checks and send email notifications.

It will check the autojob count is the expected, if not it will send an email. Likewise if a grid application is offline it will send a notification email.

It is smart enough not to spam you on every check – rather it will only send notification emails every x number of checks.

perfData.ps1

This is a Powershell script that we run from a Windows server as a user that has the rights to query the performance counters on a Windows SQL Server server. It will gather the stats and then FTP them to our monitoring server. (username and password for the ftp server are set at the bottom of the script)

In my example, D: E: F: refer to the drives that the database is on – we are extracting the perf counters for the drives.

monitorProcWindowsXML.pl

This is a bit of a special script and is used in conjunction with perfData.ps1, it will take the data from perfData.ps1 that has been ftped and it will read the data and push it in to rrdtool archives that we can use for graphing.

Scheduling the Linux Scripts

Files Locations

rrd archives

Testing states (used for the counts of errors)

/var/lib/jbcmon/state/<hostname>/<port>

Images and Webpages

/srv/htdocs/server/<hostname>/<port>

/srv/htdocs/server/<db server>

Example Graphs

In some instances, I’ve provided .html files, creating your own to display information that you find useful is very easy. Equally, it’s pretty easy to create your own graphs. RRDTool is fantastic for doing so.

M3 Subsystems

This is the most useful of the graphs – we are looking at the M3 subsystems – we can see the memory usage, jobs, threads and CPU. The blank sections are where the subsystem shuts down due to inactivity.

Same graphs over a month

Grid Component logs

JVM memory usage & CPU

This shows us the JVM memory – max, used against the CPU – handy for locating situations where you have a run-away process or a grid component doesn’t have enough memory allocated.

Performance Counters from the Grid

There are many performance counters and there is little to no documentation on them – as I already have the data I figured I might aswell log it, and why not graph it 🙂

Some M3BE Performance Counters –

I’ve had difficulty getting information on the specifics of what some of the counters means so these graphs may not make a huge amount of sense.

In Closing

As mentioned at the beginning, the details are pretty light – and as I get time I’ll be looking at creating an install script which prompts you for server details and builds a config file and cron file to make it easier to get up and running.