Consider it taken care of.

SCOM, SNMP and TRAPS or The Good, the Bad and the Ugly : Part 2

If you have made your way through Part 1 then you have written your management pack complete with your own custom discovery and imported it into SCOM. Once you have ensured that it is discovering only the devices you wish to manage in this pack it is time to begin writing the monitors and rules that will apply to the detected devices. As was mentioned in Part 1a program such as MIB Browser can be very handy in assisting with sorting through all of the OID’s and the healthy values which correspond with each individual OID.

Creating an SNMP Get Monitor in SCOM 2007 R2

I find the easiest way to create a new monitor or rule is to start with the System Center Operations Manager Console. I will admit it’s not the best and does not give you many of the options you probably want but I find it’s the easiest way to get the XML started, and then edit it to get exactly what we want after the fact.

We will start in the management console, Authoring tab, expand Management Pack Objects and right click on Monitors, Select Create a Monitor \ Unit Monitor.

From within SCOM click on the Authoring tab and then right click on Monitors which is listed beneath Management Pack Objects. At this point we would like to choose Create a Monitor – Unit Monitor, once this has been picked you will see the following screen:

First we will create a simple expression Get Monitor and later we will deal with TRAPS, so we pick SNMP – Probe Based Detection – Simple Event Detection – Event Monitor……

Be sure to create this in the management pack you created for the discovery of the object.

Now we have to name our monitor, Select a target (You are looking for the device type you defined in part one and you may have to click the “View all targets” radio button for it to appear) and add a parent monitor (this defines where in the health view tree your new monitor will appear)

Personally I always use the discovery community string but you could use something custom if you want. The frequency is how often you want the monitor to poll the device and the object identifier or OID. This is the bit this will be used in the SNMP get call I find it works most reliably if you don’t have a leading period.

We need to create an expression what causes an alarm. I will keep the expressions simple so you can get a feel for one that works. Click the +Insert at the top and you are presented with 3 fields. The first field that appears parameter name is the magic field.

/DataItem/SnmpVarBinds/SnmpVarBind[1]/Value

This is the value you are going to compare it is based on the First SnmpProbe from the step before. I have read that if you have more than one SnmpProbe that the number in this case [1] is in reverse order so [1] is at the bottom and [2] would be just above it in the list. Personally I have only one OID providers right now so I don’t know. Let me know if you figure it out for sure. The operator gives you a drop down of choices. I will get into it more below but thing about this one carefully. If you can use a simple equals or does not equal you can make things much easier. Think of it like this if a UPS battery charge of anything less than 100% is bad then use an expression like “/DataItem/SnmpVarBinds/SnmpVarBind[1]/Value – does not equal – 100” instead of “/DataItem/SnmpVarBinds/SnmpVarBind[1]/Value – less than – 100” it will just save you a bunch of extra steps even if it is not quite as flexible.

Second SnmpProbe lets you pick an OID just like for the first SnmpProbe personally all the monitor I have so far use the same OID as in the first provider as I am watching for a single value to be either good or bad. The second expression is exactly the same as the first. If you want a monitor that will not recover (you have to manually reset the health state I use something like “/DataItem/SnmpVarBinds/SnmpVarBind[1]/Value – does not match wildcard – *” since any GET will have some result it will never not match and this will never recover.

Configure health lets you decide how the device health will change when the monitor gets tripped. I use second event raised as healthy and first event raised as warning or critical depending on whats going on.

The last option is if you want to create an alert or not, up to you.

Not so simple expressions

So lets say you don’t want a simple equals or does not equal kind of expression. It’s there in the drop down so whats the big deal you say? Well the SCOM Console make what I consider a bad assumption when creating rules and monitors. All the datatypes are strings. so although “100” does not equal “10” produces a true result “100” is greater than “10” when the values are strings has no meaning. Fixing this is actually not so hard and you have 2 choices. If the next bit is clear to use go for manual xml editing, if that makes you nervous then hold on for the second option.

Option 1 : Advanced

Export your MP to XML and open it in your favorite xml editor.

Way at the bottom you will find an ElementID linked to the text label you assigned to the monitor. Use this ElementID to find your monitor or rule and alter as follows. I have highlighted the 4 places you must change “String” to “Integer”. Save the file and re-import it into SCOM and your monitor should be working.

Export your management pack to XML then Using System Center Operations Manager 2007 R2 Authoring Console open it. You may be asked for dependencies that are usually found in “C:\Program Files\System Center Operations Manager 2007” but you can easily enough find *.mp

Once you find your monitor or rule of choice right click, properties. Configuration Tab. Under each >>>XPathQuery and >>>Value you will see >>>@Type you need to change 4 Types to Integer. You can see examples of the last two changed in the image below.

Then once you are finished just save the management pack and re-import it into SCOM and it should work.

Note: I am writing this in a somewhat sleep deprived state. I have not talked about rules at all but they are simpler than monitors so I hope it’s clear where the magic is. I will also thank David Allen for some blog posts that helped be although I can’t find them right now. If things here are not clear or more detail is needed please comment or contact me and I will see what I can do.

I am sorry, clearly I didn’t see fit to record where I found that. It’s been a long time and I don’t recall. I suppose it may not be the same in 2012 either way.
I do recall that I started with a wildcard trap alarm and looked at the details that arrived…. perhaps it was from that log or alert?
wish I could help more but I don’t have the system to test on any more.

what if I want do some more complicated process with ValueExpression , some like convert Bit status to Byte Status or Mbyte , can you tell how can I do that or guide me to useful resources for this point

I have total 16GB RAM space and i want to alert if lesser than 3 GB of Total RAM. I had configured /DataItem/SnmpVarBinds/SnmpVarBind[1]/Value lesser than 3467826 to be critical and alert. The other condition is /DataItem/SnmpVarBinds/SnmpVarBind[1]/Value greater than 3467826 to be healthy.
Once configured i am getting alerts for higher values like 19458676.

As you had suggested i had change the data type to Integer and i do not get any alerts now,

When i query them manually i get the values in KB. So had configured them accordingly.