WEBINAR:

Complex networks require increasingly sophisticated monitoring systems. However, far too often, monitoring is an afterthought and not a holistically engineered part of the system. In fact, it is very common that the overall monitoring system is complicated and mission-critical, yet has varying degrees of documentation, training, fault-tolerance and security.

In order to improve, organizations must recognize that a monitoring system itself can cause problems and there are a unique set of issues that must be taken into account and mitigated.

Perceived Reliability

We must consider how people perceive the accuracy of the automated feedback systems. A properly designed monitoring system must be such that operators can realistically investigate and record the findings of all alerts raised or issues flagged.

In other words, the system must be a closed loop where in issues are raised, investigated, mitigated (if need be) and results logged. The problem is that as the number of erroneous alerts increase, the amount of personnel time wasted and level of frustration increases as well.

This "perceived reliability" is a key dynamic for any form of monitoring. If operators have expectations that are out of alignment with what the system can deliver, then they are far more likely to discount reports coming from that system and even falsify reports in order to "not waste time."

Far too many accidents have taken place due to operators assuming that messages were false positives when, in fact, the alerts were accurate. From this, we can posit The Law of False Alerts: As the rate of erroneous alerts increases, operator reliance, or belief, in subsequent warnings decreases.

If a complex system has an area where there are constant false alarms coming from a monitoring system used to detect a security breach, or any critical parameter for that matter, wouldn't that be a prime target by a hacker or terrorist? Whether it is an intrusion detection system that constantly reports non-existent incursions, a flaky motion sensor flagging movement that doesn't exist, or an open/closed sensor providing a false report about a valve's state, if it is a known weak link due to media reports or even the office rumor mill, then it is at risk of allowing a breach to happen.

What do we do?

First, we must treat monitoring as an intrinsic part of the overall system in question. By adding monitoring with little thought to a system, we risk monitoring the wrong events and/or wrongly interpreting reported data. In other words, there must be a holistic approach that identifies key performance indicators in the system, their acceptable bounds and key causal logic. "If these sensors register X, Y and Z then event Alpha must be taking place and the IT operations must be alerted immediately."

The human factor must be taken into account and careful planning of what events trigger an alarm, processes to validate results, layout of the messages and so on. Always bear in mind that as the level of false positives increases, faith in the monitoring system decreases. The monitoring system must not only be accurate, it must be viewed as accurate and as providing value to the operators or they will increasingly ignore it over time, perhaps to disastrous results.
p
Second, build "monitoring in-depth." This is a play on "defense in-depth" in that multiple sensors are arranged to confirm events.

For example, one potential scenario is that a more sensitive but more error-prone sensor is used to initially indicate a state and a less sensitive but more reliable sensor is used in series to corroborate the earlier "fast alert" probe.