Last contacted

Last contacted

Some customers not only want to know that health service hosted on the computer was not heart beating (remember, this is way different from our approach to recognize computer down, which BTW solely based on my own opinion, is rather unfortunate attempt to recognize something for what health service watcher was not originally designed), but they also would like to get at least some information about the last time such health service contacted its management server. This post is one possible solution that can be used.

This information is already present in health explorer (in some sort of form as we will see later), but is not as friendly to locate and requires “big” TCO. I will mention how one can do that In SP1 version of OpsMgr2007 anyway:

1.One need to open “Health Explorer” form health service watcher that was marked critical. (Health service watcher views are located inside of the folder “Operations Manager” subfolder “Agent” for those of you who never had to wonder there.)

3.Context of the top state change to critical carries data type that caused state change and its “Date and Time” carries value when runtime recognized that heart beat is missing

Following is screenshot from next version of OpsMgr. We did some improvements in unavailability recognition and changed internal plumbing for some of the “Health Service Watcher” monitors (that is outside of the scope of this post and I may do another one describing the changes once release date approaches). It display the fact that data type used still contains same information about when runtime recognized that health service was not heart beating and that such information is present inside of “Date and Time” within context of the state change:

So I just proved that this is highly ineffective to do when multiple health services are not heart beating and one wants to have a quick view with information when heartbeat miss was recognized and what was possibly last time given health service contacted its server.

Before we do this, I need to explain how availability is stored in our Operational Database a little bit. There is a table “Availability”. One of the columns for this table is “LastModified”. Value is equal to the time when runtime notified SDK service about availability changes. That is not the time when runtime was last contacted by health service though. Last contacted time can be calculated based on heart beat interval and how many heartbeats should be missed prior notifying SDK about the fact that heartbeat was missing. Values for interval and number of missies are stored within global settings. And that gives us opportunity to create following SQL script:

Based on the result and comparing with health explorer data, we can see that recognized is “equal” and approximate last contacted is calculated (by default it should be around 3 minutes before recognition). Maybe I will create a report in the future which will try to display this information in more unified manner, but that is not my intent right now ...