Nagios / SNMP tools

Here is a collection of various tools that I wrote or adapted
to my needs to ease the task of Nagios
monitoring of various services mostly on Linux servers.

SNMP extend

SNMP daemon version 5.0 and above from the NetSNMP
project provides a way to access output of user supplied scripts via
SNMP protocol. In other words: SNMP client on one machine can invoke
a script on another machine just by sending a SNMP query. After the remote
script finishes its standard/error output, return code and some other
values are sent back to the client in a SNMP response.(NOTE: See SNMP exec section below if you run older SNMP daemon than NetSNMP 5.0)

For example - consider you want to query an actual date and time
of a server. Indeed, there are some standard OIDs in the System MIB, but
you can as well run /bin/date every time and pass its output
through SNMP back to the client. Here is how to do it:

On the remote server configure date extension in /etc/snmp/snmpd.conf. Simply add this single line at the end of the config file and reload snmpd:

extend datecheck /bin/date

From any client that has allowed SNMP access to the server query the datecheck with:

Have you found this script useful? Please support author by PayPal donation.

SNMP exec

SNMP exec provides a similar functionality to extend, however
exec is less flexible and slightly slower to work with. On the other
hand it is supported in many older implementations of SNMP daemons including
UCD-SNMP and NetSNMP 4.x which are still found on many servers.

The configuration and operation of exec is very similar to the above
described extend. I won't repeat myself - simply replace all strings
extend with exec in the above Nagios config file and
download this script to Nagios' libexec.local directory:
check_snmp_exec.sh

Scripts for SNMP extend / exec

Your Nagios is now ready to query status from remote scripts via
SNMP. The conventions for these scripts are fairly trivial - the first
word on the first line must be either
OK or WARNING or FAIL or UNKNOWN. This word is
translated to the appropriate return code by
check_snmp_extend.sh or
check_snmp_exec.sh and returned back to Nagios.
That's all.

BTW from now on I will talk about extend only but everything entirely
applies to exec on older daemons as well.

Now it's a good time to introduce some scripts that can be used on the remote
servers with extend or exec...

Monitor for Linux Software RAID

Most low end servers rely on Linux SW-RAID for their data storage.
Monitoring such array and getting an alert as soon as a problem appears
is an essential part of maintaining a decent availability of your
services.

The core part of this monitor is a nagios-linux-swraid.pl script
that parses RAID status information from /proc/mdstat and
reports a single line similar to the following on its standard output:

Have you found this script useful? Please support author by PayPal donation.

System Up-To-Date monitor (APT / YUM)

Keeping operating system up to date with latest patches is essential
in most environments. Following two scripts check whether new patches
are available for download. The first one works with
APT based systems
(e.g. Debian, Ubuntu or even OpenSolaris Nexenta, ...)
and the second one is for YUM based systems (e.g. RedHat, Fedora or CentOS).

You can make the script itself to run apt or
yum to download information about newest updates from the
internet, but that usually takes too long time and Nagios usually in the middle timesout. Instead
on my systems I run apt or yum every few
hours from cron and let the Nagios script only do the
quick tasks. More specifically - on APT systems I run
apt-get update from cron, because that download data
from the net and then run apt-get -q -s upgrade
from SNMP script because that only reads local database and parses the
output. On YUM systems I run yum check-update from
cron and store its output in a file. Then from SNMP script I only read
and parse that file. Keep reading for example usage.

That's all. You will get "OK" result if there are no new updates and
"WARNING" with a list of packages to update if there are any. On Ubuntu
you'll even get "CRITICAL" result if there are any security
updates, and "WARNING" when there are only non-security ones.

Have you found this script useful? Please support author by PayPal donation.

System uptime monitor

Reporting system uptime and generating alert on system reboot is often useful.
Download check_snmp_uptime.pl that does just that - reads system uptime, records last reading and alerts
when current uptime reading is lower than the last recorded one. Easy, eh?

Two important things:

net-snmp provides the snmpd daemon's uptime in
DISMAN-EVENT-MIB::sysUpTimeInstance (.1.3.6.1.2.1.1.3.0)
The real system uptime is available as
HOST-RESOURCES-MIB::hrSystemUptime.0 (.1.3.6.1.2.1.25.1.1.0)
Many other devices like switches provide only the former OID.
This script can read either of them. Use --sysUpTime or
--hrSystemUptime to select the appropriate OID for each device.

If --dbfile parameter is not used then the script will only
check and report the uptime and return OK. No alerts will be generated at all.

Have you found this script useful? Please support author by PayPal donation.

MySQL replication monitor

MySQL provides a relatively easy way to run online mirrors
of the master database. The mirror is called slave
in MySQL terminology and the mirroring process is called
replication. With the following script it is easy
to add your MySQL slaves to Nagios monitoring and receive an
alert whenever replication drops for some reason.

The script needs to connect as a MySQL user (say user monitor)
with privilege REPLICATION CLIENT. Use this GRANT
command to create such a user:

The script returns "CRITICAL" whenever replication breaks for some
reason, e.g. Slave IO or Slave SQL process doesn't
run, or the replication is too far behind master (2 minutes by
default). "WARNING" is returned when replication is more than 1 minute
but less then 2 minutes behind master and "OK" is returned when
everything goes all right. You can also get "UNKNOWN" return code,
usually on misconfiguration or when the script can't connect to the
slave server. That's all ;-)

Have you found this script useful? Please support author by PayPal donation.

PostgreSQL / Slony cluster monitor

Slony is a popular PostrgreSQL replication
system. It is a good idea to monitor the status of your Slony cluster and
trigger an alert whenever any node gets out of sync. The following script will
help you do just that.