We have a Network Operations Center with a dozen large, widescreen displays showing us various performance graphs, server and network equipment alarms, and status pages. I lot of the pages were obviously not designed for viewing on a static display. Does anyone have a similar setup where they have found a particular tool or package that excels at displaying data? I'm thinking that a bit of custom programming and maybe something that can scroll text, show dials, flashing lights, and whatnot would produce what I'm looking for, but I don't know where to start. If anyone has any dos or don'ts or success with particular products, that would be a big help.

This question exists because it has historical significance, but it is not considered a good, on-topic question for this site, so please do not use it as evidence that you can ask similar questions here. This question and its answers are frozen and cannot be changed. More info: help center.

What is the source of most of the data? I assume you are using several different monitoring/alerting tools. I would bet that most of them have add-on applications for visualization.
–
WerkkreWMay 13 '09 at 21:23

Our main sources of data are SPECTRUM and Nagios.
–
JosephMay 13 '09 at 21:31

6 Answers
6

Computers are far better than I at analyzing data. I personally prefer systems like OpsView that digest situations and offer a multifaceted interface. Monitoring stats are filtered for abnormal conditions, and individual alerts are delivered to admins responsible for the system. There's an overall health dashboard that's viewable by helpdesk and management that gives an impression of how bad an outage is and whether anyone who can fix it is working on it yet. They put it on rotation on the big screen as something you can see at a glance, not something you stare at all day. Scrolling text and flashing lights aren't how salaried employees should interface with your monitoring systems.

You define situation monitoring as capturing a set of signals about a state. Load, free disk space, network traffic, or even higher level things like forum posts per hour.

Then you define a heed function that maps the wide input signal from 0 to 1, with 0 being "ignore" and 1 being "zomg!". In terms of Nagios, he replaces the WARNING state with a WARNING integer.

Finally you define a a aggregator to summarize and prioritize those WARNING signals.

As far as specific tools you'd use to write your own monitoring system, Nagios scripts have a decent interface (probably this is where you'd glue in a HEED mapping if you like it), storing signals can be done with rrdtool, and you can generate graphs from that, and there's a Django app called Graphite that renders rrd databases. There's also Nagvis:

NagVis is a visualization addon for the well known network managment system Nagios.

NagVis can be used to visualize Nagios Data, e.g. to display IT processes like a mail system or a network infrastructure.

We had too many displays and not enough useful info, so we totally cheated. We found an interesting LCARS-based screen saver (looks like the displays from Star Trek) and ran it on one of the idle displays. That was the one the bosses watched most.

Yup. Seems that the biggest need for the big screen is the bosses wanting a showplace. I recall one job where we put up some fake-but-nice-looking do-nothing displays. Our real status system was the backround color. Green for nominal ops; yellow if we had probs that would make at least one director-level person take note; red if it would get 3 or more directors angry. Obviously, black for an all systems outage. Ha-hah.
–
quuxMay 27 '09 at 0:10

I wrote my own Nagios visualisation after finding out that none of the easily found versions can handle hundreds of hosts with tens of thousands of checks. (To release the code I need a few people who want to try it outside of my environment so I can convince the bosses)

Even the few that might not break required manual configuration that our nagios config generator couldn't be perverted to do.

My visualisations are use on OS X and Linux, oddly the only OS X browser with a working fullscreen mode is Opera, neither Safari (and that includes webkit) nor FireFox do.

A few general tips though:

Big fonts, to the point of automating layouts so they get bigger if there's less to display

Use sorting so the biggest problems are first

Use META refresh, not javascript for reliability

Do your best to minimise the maintenence needed, better to be getting warned about a system not yet in production then finding out a year in that it was never added to the displays

SVG can be wonderful, although they seem to get corrupted over time (we use a simple graphic of a state as an additional visual cue)

I don't know of any existing packages, but if you're happy coding stuff and your NOC is Windows-based, you might be interested in the PowerShellToys that /n software have announced. There's a post on the PowerShell blog providing more information and links to each PowerShellToy. My first thought on hearing about these was that they would be ideal for creating a dashboard for monitoring servers and whatnot.

While the systems running the displays are Windows-based, the data comes from mostly Linux based systems. I'm not sure if this would help. The idea of a dashboard is exactly what I'm looking for though.
–
JosephMay 13 '09 at 21:39

We use Mercury (now HP)'s BAC tools for our dashboards. I can take counters, alerts, etc from SO many sources, crunch their stats and dashboard those stats in a variety of ways. I'll warn you know though, this is a high-end solution - very spendy.