Software Development Frontiers

Monday Dec 01, 2008

This is the first in a series of blogs
on installing, configuring and running Nagios on Solaris. If you're
not familiar with nagios, check out nagios.org to learn about this
great open source network monitoring application written by Ethan
Galstad.

I jumped in by downloading the source,
building with gcc on OpenSolaris 2008.05 and deploying on a single
OpenSolaris x64 system and a Solaris 10 Sparc system. Some things
worked immediately, and some didn't.

Then I ran across this book: "Building a Monitoring Structure
with Nagios" by David Josephsen.

See it here:
http://www.skeptech.org/?page_id=4

I recommend reading this book
cover-to-cover to understand all the issues involved in monitoring
remote hosts and applications with Nagios. Nagios is an example of a
highly configurable application which requires the user to be aware
of security and performance issues when configuring the network for
monitoring. A poor job done in configuration will result in sluggish
monitoring capability and unhappy users, so knowing the issues up
front is key to using Nagios successfully.

This book provides you with a great
look behind the scenes, facilitates understanding how each piece of
Nagios works and how to configure the app for each device on your
network. The book is written from a Linux perspective, but the author
points out some of the differences for \*nix and Windows systems.

The book describes Nagios 2 usage. I
downloaded and built Nagios 3.0.5, so I'm sure there are a few
updates to the author's directions.

Nagios has several components:

1. Nagios proper (sometimes referred to as
the daemon) which runs on your monitoring host. It decides when to
collect information on each monitored host,initiates actions to
collect the data, and writes the collected information to log files.
The user selects which hosts/devices are to be monitored, how
frequently to gather the information, how much information (disk and
CPU usage, application status, etc.) to gather and whom to notify
when a user-defined problem is detected.

The user specifies all of
this in configuration files. As you can imagine, the number and
contents of the config files can become quite extensive and Josephsen
devotes chapters 4 and 5 to describing the form and contents of these
files, as well as some semi-automatic means to aid the user in
creating the files.

2. Nagios GUI is a cgi-based visual
monitoring tool to provide the user with a summary of the network and
monitored application status in a convenient format. Events
triggering warnings or escalations are highlighted in yellow or red
respectively. More about the GUI in future blogs.

3. Nagios plugins are small executable
files (scripts or binaries) which run on each of the remote hosts and
collect one piece of information (number of users, percentage of disk
full, etc). There are over 200 of these files which are downloadable
from nagios.org.

4. The remaining Nagios piece is the
connection mechanism which links the monitored data on the remote
host with the Nagios daemon. The most commonly used mechanism is the
check_nrpe script, running on the Nagios daemon host, and the NRPE
daemon running on each monitored remote host.

Galstad's
documentation on NRPE and check_nrpe is available at
http://nagios.sourceforge.net/docs/nrpe/NRPE.pdf . This document contains
handy drawings to depict how the two components work together to
transfer remote host information to the daemon.

That's all for this blog.

In coming
blogs, I will address unique considerations for getting Nagios up
and running on your OpenSolaris and Solaris 10 systems. Send me email
me with any Solaris-specificNagios issues and I'll try to help you
out.