Back in the dim dark past, before IFLs M3 install became gridified I had set up Nagios (an open source monitoring tool) to check the autojobs – this would verify that the autojobs had a status of OK and it would also check that we had the correct number of autojobs. Back in those dark days, we did have the occasional issue where one or some of the autojobs would cease to work.

Once we went to the grid, this was no longer possible due to how the grid structured its data and I couldn’t determine a way to scrape the screen given it was generated through Javascript and ajax calls. But at that point of IFLs M3 journey, autojob issues weren’t as common so rather than spend the time looking for clever ways to do things it just became a part of the morning process to check the autojobs with a webbrowser. As I had a staff member that I could fob this task off to, I was happy 🙂

In an earlier posting I mentioned I have had a number of discussions around monitoring and M3, so the aim of this post is around demonstrating again, that by using the tools built in to Windows 8 / Windows Server 2010 onwards we can quickly and easily create crude but effective monitoring.

This Powershell script will check that the number of running autojobs is what we expect (54 in IFLs case) and if it isn’t, it will list the autojobs as we would see them in the Grid Management pages and send an email alert. The aim is that it would be scheduled to be run by the task scheduler every 5 – 15 minutes.

This script took me around 8 hours to build up and test, mainly because I wanted to demonstrate how we can scrape information from the autojobs that we see in the Grid Management pages which proved to be extremely challenging.

Finding the M3BE Autojob Data

Figure 1 – Autojobs from the Grid Management Pages

The page above, if you’ve ever taken a look at the source provides a nice ‘javascript not enabled’ error and we can’t see the content. This also occurs when we extract the data from programs that aren’t web-browsers.

I could have used a COM call to Internet Explorer and have it parse the content, but that seemed like it would be cheating and given Microsoft is depreciating IE, rather short sighted. I wanted to use a pure powershell solution that should in theory run on a headless install of Windows.

The first thing I did was pull out Wireshark to see if I could locate the javascript calls being made so I could call them directly – no luck.

Then I pulled out Fiddler to see if it would help in my search

Figure 2 – here we can see the call to get the webpage and we can see the table in the WebView

Impersonating all of the headers being sent still didn’t allow me to the response I was after, and it turns out that the headers weren’t the important part – we have some arguments in the body that are used by the grid to determine the response.

Figure 3 – the body of the data posted to /grid/ui

With this new piece of information at hand, I discovered I could retrieve the data that is used to display the table.

And then I restarted M3 and my script stopped working.

It turns out that the body here uses the JVM ID for the grid to know which component to query – this id changes between JVM restarts

Figure 4 – JVM ID

This is what we see in the grid management pages too

Figure 5 – JVM ID in the Grid Management pages

Of-course there is no easy way to discover this information. My thoughts where that I would have to parse one of the other grid pages that provided the link before I could query this page which seemed like an awful lot of work and processing.

So I started looking around and I found some foundation REST services, in particular a rest service for the subsystems

Figure 6 – Foundation rest calls

Calling this, I got some interesting data

Figure 7 – foundation/subsystem REST call

I got the JVM ID and I also got the type of subsystem (A, I, M), so I now knew that I could get the IDs and work with any of the subsystems.

The eagle eyed will also notice that there is a job count there – so it means that I have an easy way to determine if I have the number of jobs expected without having to go through the intensive process of parsing the UI based tables.

I then went ahead and added code to process the UI based tables as an exercise to demonstrate that we could infact do more detailed and smarter monitoring. For example you could save this data to a file on each poll and check to see if you have jobs that are taking excessive amounts of time – but this is beyond the scope of this posting.

For this script, we will only parse the UI tables if we have a different number of autojobs than expected so the RUNNING jobs can be included in the email.

Testing

Figure 8 – MNS051, shutting down an autojob

I’m stopping DRS900 – so now we are shy one job (53 instead of 54)

Which after running my Powershell script yields an email:

Figure 9 – error email

Of-course the body of the email isn’t really helpful, but it is really to illustrate we can extract each cell in a table which will provide us with the option to process the data further – and please excuse the trailing . on each line, Outlook insists upon removing the line breaks if I don’t do this.

And the Powershell script itself

Usual caveats apply – this is proof of concept only. It could be better enhanced so it takes arguments from the command line for the location of the grid so you can use the task scheduler to run a single script with different args…

# this is our grid
$grid = 'http://ifbenp.indfish.co.nz:16201'
# IP Address of my mail relay server
$smtpRelayServer = "mail.potatoit"
# this is the number of autojobs we expect
$expectedAutojobCount = 54
# now we build our URL to the Foundation Subsystem REST service
# where we can get information about the different M3BE subsystems
$foundationUrl = $grid + '/foundation/subsystem'
# this is the URL that we will use to retrieve the M3BE autojobs
# themselves from the server which we will include in our
# email if there is a problem
$gridUIUrl = $grid + '/grid/ui/'
# this is where we will store our errors
$outputString = ""
$emailSubjectAppend = ""
try
{
$response = Invoke-RestMethod -Uri $foundationUrl -Header $header
# we need this if we are going to dig
# in to the autojobs themselves
$autoJobJVMID = ''
# we are interested in subsystem A - autojobs
for($i = 0; $i -le $response.Subsystems.Subsystem.Count; $i++)
{
$currentSubSystem = $response.Subsystems.Subsystem[$i]
if($currentSubSystem.types -eq 'A')
{
$autoJobJVMID = $currentSubSystem.jvmId
# autojob count wasn't what we expected, we should include a URL
if($currentSubSystem.jobCount -ne $expectedAutojobCount)
{
# as we didn't get the count we expected, we shall list the autojobs in the email
# we send out. Ideally we would also process the autojobs themselves and check
# for long running jobs
$outputString += "Autojob count was $($currentSubSystem.jobCount), expected: $($expectedAutojobCount)"
$emailSubjectAppend = $outputString
# this is data in the body of the request we send, it requires the autojob JVMId which will be
# different each time we restart the autojob JVM
$body = "target=grid.serverview.SubsystemUI%24DefaultPage&amp;keySet=M3BE_15.1_TST&amp;jvmId=$($autoJobJVMID)&amp;seqno=1"
# we are pretending to be a web-browser looking at the grid/ui
$gridAutoJobResponse = Invoke-WebRequest -Uri $gridUIUrl -Body $body -Method 'Post'
# get the HTML
$html = $gridAutoJobResponse.parsedHTML
# The is a real mess for us to process, we want it nice and clean so we could
# do some intelligent processing - I've included the code to break it down
# into something you can create rules on purely to illustrate how
# get a list of the tables - there are a lot 😦
$tables = $html.body.getElementsByTagName("table")
# iterate through the tables
for($i = 0; $i -le $tables.length; $i++)
{
# we are looking for a table with a class name of GridPanelTable (again, there are quite a few)
if($tables[$i].className -eq 'GridPanelTable')
{
# we then want to locate one that has greater than 9 headers
if($tables[$i].getElementsByTagName("th").length -gt 9)
{
if($null -ne $tables[$i].rows)
{
# loop through each row
for($r = 0; $r -le $tables[$i].rows.length; $r++)
{
$rowText = ""
if($null -ne $tables[$i].rows[$r].cells)
{
# loop through each cell
for($c = 0; $c -le $tables[$i].rows[$r].cells.length; $c++)
{
# I use innertext here so we don't get all the formating that makes
# the webpage look pretty. We could do some smarts on the inner
# data to check for long running jobs
$rowText += "$($tables[$i].rows[$r].cells[$c].innertext), "
}
$outputString += $rowText
# stop outlook from removing the newlines :@
$outputString += "."
$outputString += [Environment]::NewLine
}
}
}
break;
}
}
}
}
}
}
# did we get the autojob subsystem at all?
if($autoJobJVMID -eq '')
{
$outputString = "Autojob JVM not found, there is a good chance it is completely down :-("
}
}
catch
{
$outputString = "Failed to retrieve information from the grid"
}
# only send an email if the length of our string is greater than 0
if($outputString.Length -gt 0)
{
# create the message object
$msgMessageObject = new-object Net.Mail.MailMessage
# create our SMTP client object with the relay server as an argument
$smtpClient = new-object Net.Mail.SmtpClient($smtpRelayServer)
# I am going to use the date in the subject of my email
$strDateNow = [System.DateTime]::Now.ToString("yyyyMMdd HH:mm:ss")
# set the from address to a valid from address for my relay server
$msgMessageObject.From = "scott.campbell@potatoit"
# set the recipient address
$msgMessageObject.To.Add("scott.campbell@potatoit")
# set the subject
$msgMessageObject.subject = $strDateNow + " - M3BE Autojobs " + $emailSubjectAppend
# set the body of the email
$msgMessageObject.body = $outputString
# actuall send the email
$smtpClient.Send($msgMessageObject)
}

Glad you like it. 🙂
I have recently been working on a post which discusses this and graphing various M3 counters, essentially I’ve got a little Linux appliance with some perl scripts that provide notification and trending graphs…

It really is nice! May adjust it to once an hour, but it’s sure nice to have this!

Very cool idea to have a little set of scripts to monitor/graph things. I’d be interested in setting up something like that. This is just great stuff. Having wished to be able to monitor/alert on this for so long, it’s great to see an option.

One note for anyone else – if sending to a distribution group, be sure to check “mail flow settings” tab, and uncheck “require that all senders are authenticated.” I’d tested to myself, thought “great!!!!” and then got nothing when sending to a group. One of those silly little things with Exchange.

I had a brief look at 13.3 and its monitoring – and to be honest I was pretty disappointed, they weren’t terribly intuitive to set up and looked very limited (granted, my scripts aren’t terribly stellar – but…). I will need to sit down and take a good look – I really just wish Infor would add SNMP support to the grid – then it would be easy to integrate in to most standard monitoring packages without need for complex scripts and without the need for fairly heavy handed communications just to check the health of the system and trends.

1) Find the long JMX URL in the logs of the Grid application you are interested in, for example grid-router-M3Router.log; the log will say something like INFO SYSTEM JMXUtils: jmx url service:jmx:rmi://11.22.33.44/stub/SGVsbG9Xb3JsZA==

2) Launch JConsole at the DOS command prompt (JConsole is the Java Monitoring & Management Console, and it is standard and part of every JRE), paste the JMX URL starting from service:jmx:rmi://, then enter your Grid Username and password (probably need to be grid-admin), and click Connect (I connect from within a Remote Desktop Connection of the Grid application’s host to avoid problems with ports being blocked in my VPN connection). It will show mostly JVM runtime information like heap memory usage, threads, classes, CPU usage, VM summary.

3) The MBeans tab is the most interesting. The Grid provides MBean interfaces to get information and run operations at runtime. Expand the MBeans tree, and play with some of the operations. I mostly got lots of java.lang.ClassNotFoundException. I was able to get the GridName attribute and to run the operation NodeRegistry.listHosts…not very useful.

re: SNMP – no kidding. This is why your work on this script, digging out that (seemingly *impossible* to find) information, and bringing it all together into something of great value and usefulness is so very much appreciated!!! 🙂

Tried JMX strings into Zenoss with similar results in the past – mostly useless information. After finding another product recently that seems to have good potential with monitoring M3, I went on another hunt to see if there was anything else on the ‘net about basic monitoring, and came across this post. It really, really made my day.

Sad to hear the latest isn’t that great on monitoring / alerting. Sounds like they have work to do. But yes, wouldn’t life be so much simpler if there were some basic SNMP info to poll. I’d sure love to just let Zenoss pick up the info and do its thing. Guess that since it’s all Java processes, that complicates things greatly.