The Devil is truly in the "details($*)"

In my introductory blog post I mentioned that I've been working with OMNIbus for a long time. For evidence, I will point out that I was one of the first batch of Certified Netcool Engineers (there were five of us that passed.) This was the "beta" version of the OMNIbus certification program and we were the first (back in January 1998.) So why do I bring this up?

I bring it up because one of the exercises in the certification involved diagnosing a poorly performing ObjectServer, and one of the main problems was that the ObjectServer was bogged down with tens of thousands of records in alerts.details. Fifteen years later, I still see this all the time when I'm first called to look at existing customer deployments.

Part of the problem is that our own Netcool Knowledge Library (NCKL) is a terrible offender. "Out of the box", it is littered with hundreds of details statements that are not commented out. Particularly odious are the "details($*)" statements. I did a quick check of the 3.8 release of NCKL. Without uncommenting any vendor includes, there were still 430 instances of details including 88 instances of details($*).

This is a recipe for bogging your ObjectServer down quickly. Remember that every "detail" is an individual row in the alerts.details table. So a statement like: details($ifIndex,$ifAdminStatus,$ifOperStatus) isn't too bad because it only inserts three records. But details($*) inserts a record for every probe element variable. So if you have 10,000 events in alerts.status that were processed by that part of the rules file, you could easily have 100,000-200,000 details records dragging down your ObjectServer.

The original idea behind the details feature was that it was nice to be able to associate various name-value pairs with an event while debugging the construct of probe rules files. One of the first rules of "Best Practice" for those of us who went to customer sites was to make sure that we had commented out all of the "details" statements in the rules files before we left the site. This, however, was centered around an ideal for rules development that somehow thought that rules files could be "done".

In our rough and tumble real world of Network Management, new devices (or new agent versions) are constantly introduced into the managed network and thus rules file development will always lag the actual events (such as SNMP traps) being seen. Thus a rules file would usually end with some sort of "catch all" clause to handle an unforeseen event -- and that part of the file would generally include an uncommented details statement - for the benefit of the rules maintainer. From that seed of practice grew NCKL's current undisciplined behavior.

An Alternative

In recent years, an excellent alternative to details has been added to the product. OMNIbus 7.2 added the ExtendedAttr column as a standard column in alerts.status. Another edition was the nvp_add function to the probe rules language. This function makes it easy to keep a set of name-value pairs in a character variable. Although ExtendedAttr was originally intended as place to hold various name-value pairs coming from Tivoli's legacy TEC console, it makes a great field to store the information that used to be put into the details. The overhead of adding a few more characters into a varchar within the same event is an order of magnitude lower than maintaining an N:1 joining relationship between the status and details tables.

The best news is: You can get a utility to do it for you right here. The following example shows it being used with one of the standard files in the NCKL distribution. The file is converted (the utility saves the original as a ".bak" file) and a "diff" shows the changes:

I have used this in production at several customer sites and it has relieved ObjectServer overloading.

Drawbacks

The primary drawback to packing name-value pairs in the ExtendedAttrs column is that there is no equivalent to the "Details" tab in the event clients. The column is just a string of name-value pairs separated by semi-colons. Here's an example:

[Note that it's possible to search for the values of individual name-value pairs. This event would be one of potentially multiple events returned by:

This is another nice feature - a similar query involving a sub-select with alerts.details is lot more complicated.]

Another possible drawback would be the case where the value of a particular element includes a semi-colon. If this is an issue, the value could be pre-processed with regreplace, replacing the semicolons with some other character.