Posts in this blog are provided "AS IS" with no warranties, and confers no rights. Use of included script samples are subject to the terms specified in the Terms of UseAre you interested in having a dedicated engineer that will be your Mic

Which servers are DOWN in my company, and which just have a heartbeat failure, RIGHT NOW?

Which servers are DOWN in my company, and which just have a heartbeat failure, RIGHT NOW?

In OpsMgr 2007, when a agent experiences a heartbeat failure, several things happen. There are diagnostics, and possibly recoveries that are run. Alerts, and possibly notifications go out.

But what happens if my Operations team misses on of these alerts? What can I do to "spot check" agents with issues?

Well, any time an agent has a heartbeat failure, we gray out the state icon of the agents last known state for in each state view.

However - you CAN create a State view that will turn Red or Yellow just like any other state views. Simply create a new State View, and scope the class to Health Service Watcher (Agent).

I called mine Heartbeat State View:

This view will show us when any of the agent health service watcher monitors are unhealthy: In my case - OWA and EXCH1 have issues. OWA is DOWN, while EXCH1 agent healthservice is stopped.

However - here is the issue. This view shows us when ANY monitor rolls up unhealthy state.... this includes heartbeat failures AND computer unreachable (server IP stack is down):

What if I want a State View - to ONLY show me computers that are DOWN.... as in... not heartbeating AND not responding to any PING? Most customers consider this their "most critical situation". Well, I haven't found an easy way to do that.... so I wrote a report which handles it. This report will query the OpsDB for the state of the "Computer Not Reachable" monitor, and only display those servers. It is based on the following query:

SELECT bme.DisplayName, s.LastModified as LastModifiedUTC, dateadd(hh,-5,s.LastModified) as 'LastModifiedCST (GMT-5)' FROM state AS s, BaseManagedEntity as bme WHERE s.basemanagedentityid = bme.basemanagedentityid AND s.monitorid IN (SELECT MonitorId FROM Monitor WHERE MonitorName = 'Microsoft.SystemCenter.HealthService.ComputerDown') AND s.Healthstate = '3' AND bme.IsDeleted = '0' ORDER BY s.Lastmodified DESC

You can import this report if you have created a data source as shown in my previous post:

Import this report into your custom folder... and run it. You can schedule it to receive it first thing every day... if you like the output:

***** Update 6-30-08 I removed a section of the original query relating to maintenance mode. We found that if a down server had never been in maintenance mode, the server would not show up in the report. The query and report download have been updated to address this.

One note to add - in OpsMgr you will get a distinct alert whenever an agent doest not respond to ping, in addition to the heartbeat failurre alert. What we dont have - is a state view JUST for computers that are down...

You could easily write a custom monitor that runs a ping script - and build your own state view for this in the console... and not need this report. The benefit of the report is being able to schedule it and deliver via email or sharepoint.

Trouble is by creating the monitor you mention you are actually duplicating work that OpsMgr is doing. It sort of highlights the lack of logic in some functionality.

To me, it makes no sense that I have to do a ping script as a monitor when OpsMgr has a much more powerful solution - agent heartbeat with associated ping of servers on which the agent heartbeat has been missed. I just need to get that information into the console .... and the fact that OpsMgr can't is a something of design flaw.

As I mentioned on the newsgroups, I don't think the report is feasible for near real time info in a large environment.

Ahhh .. didn't read that properly before I posted!! Meant the fact that agent health state couldn't be incorporated into the computer state view is something of a flaw ... realise there are the agent health state views as per my posting in the newsgroup ;-)

Anonymous

8 Nov 2008 1:14 AM

Here is a unique way to use web page views in the OpsMgr console. You can create a web page view in the

I have a different problem to the same topic. If a server goes down I do not receive any alerts. When I open Health Explorer with the above settings, I see only white bullets under Availability except Local Health Service Availability. Computer not Reachable, ... are disabled in their sealed MP. What is wrong in our configuration and what do I have to change to get an alert when a server goes down?

Thanks

Hendrik

Ren

23 Jan 2009 2:17 PM

I tried to use this UDL file by following the steps as mentioned in this site. When i run the report getting this error "An error has occurred during report processing.

Cannot create a connection to data source 'ops'.

For more information about this error navigate to the report server on the local server machine, or enable remote errors ".

You are correct - It looks like in this RDL file I named my data source "Ops" instead of "OpsDB".

Simply open the RDL file - edit that, and import..... or simply go to your imported report - edit it - change the data source to your live data source that points to the opsDB.

mccreerJ

14 Feb 2009 12:16 AM

Do you know of any way to setup subscriptions for only "ping failed" notifications? Right now every time a server fails a heart beat and cannot be pinged we receive two text messages. One for the heart beat failure and one for the Ping failure.