Note: the following documentation is for Resmon 1. See Modules for Resmon 2 module documentation

There are a number of modules included with resmon that will cover most things you need to monitor. A list of the modules is below, along with a sample configuration. You can also create your own modules.

Generic configuration options

The following options are applicable to any module:

interval : cache the result for n seconds. Useful for long running modules. Default: do not cache

check_timeout : a per check timeout value. Go bad with a timeout if the check takes longer than this to run. Overrides the global timeout value.

A1000

This module monitors the health of an A1000 Storedge disk array.

Sample Configuration

A1000 {
fa000_001 : status => Optimal
}

Arguments

Object : the unit you wish to monitor

status : the status that you consider to be OK

ADAPTEC

This module monitors the health of an Adaptec RAID controller. It requires the arcconf command line utility that comes with Adaptec Storage Manager.

Sample Configuration

ADAPTEC {
1 : noop
}

Arguments

Object : the controller you wish to monitor

arcconf : (optional) the path to the arcconf command line utility. Defaults to /usr/StorMan/arcconf

DATE

A simple module that just prints the current unix timestamp. This can be useful when using the status.txt file to ensure that you have up to date information. However, when using the XML checks, this module is no longer necessary as each check includes information on when it was last updated.

Sample Configuration

DATE {
date : noop
}

DHCPLEASES

This module checks the amount of active dhcp leases for a network and warns if the amount grows close to the maximum amount of addresses available in the dhcp pool.

Sample Configuration

Arguments

While all arguments are optional, you should have at least one of limit or minkbfree. Including checks for both percentage used and KB free may have undesirable effects, so you should only include one of these methods.

Object : the mount point or device for which you want to check free space

limit : (optional) the percentage used above which you want to go critical

Sample Configuration

Arguments

minimum : (optional) the minimum age of the file in seconds you consider to be OK

maximum : (optional) the maximum age of the file in seconds you consider to be OK

allowmissing : (optiona) what to do if the file is missing. If this is yes, then the status is OK, otherwise, the status is bad for a missing file.

FILECOUNT

This module monitors the number of files in a directory, going bad when the file count goes over a threshold.

Sample Configuration

FILECOUNT {
/path/to/dir : slimit => 10, hlimit => 20
}

Arguments

Object : the path to the directory

slimit : the 'soft' threshold, above which the module will warn

slimit : the 'hard' threshold, above which the module will go critical

FILESIZE

This module monitors the size of a specific file, going bad if it is too big or too small.

Sample Configuration

FILESIZE {
/path/to/file : minimum => 1, maximum => 16384
}

Arguments

Object : the path to the file you want to monitor

minimum : the minimum file size, in bytes

maximum : the maximum file size, in bytes

FREEMEM

This module monitors the amount of free memory on the system. It is platform
specific and currently works with Linux and Solaris. On Solaris, it makes use
of the Sun::Solaris::Kstat module if available in order to obtain the ZFS ARC
size. If the kstat module is not available, then an alternate method is used
where cache values cannot be obtained. In this case, includecache must be set
to 0.

Arguments

includecache : (optional, default 0) include cache in the amount of free memory

FRESHSVN

This module checks a subversion checkout to make sure it is up to date and pointing to the correct url. See also the SIMPLESVN module, which doesn't perform as thorough a check, but has fewer requirements and works with older versions of subversion.

Sample Configuration

Arguments

Object : Path to the working copy

URL : the url that the working copy should be checked out from

maxlag : (optional, default 330 seconds) the amount of time you allow for the repository to update before the repository should be considered out of date. It's a good idea to set this to the interval at which your update cron job runs + a few seconds.

INODES

This module monitors the amount of free inodes on a filesystem.

Sample Configuration

INODES {
/ : limit => 90%
/data : limit => 90%
}

Arguments

Object : The filesystem you wish to monitor

limit : the percentage of inodes used after which you want to alarm

LARGEFILES

This module looks for 'large' files in a directory.

Sample Configuration

LARGEFILES {
/path/to/dir : limit => 16384
}

Arguments

Object : the directory you wish to monitor

limit : the maximum file size in bytes

LOGFILE

This module monitors a log file, looking for errors. What the module considers an error is configurable.

Sample Configuration

LOGFILE {
/var/log/mylogfile : max => 4, match => ^ERROR:
}

Arguments

Object : path to the log file

match : regex that defines what an error is

max : (optional, default 8) the maximum amount of errors you will allow before going critical

MDSTAT

This module monitors the status of Linux Software RAID devices.

Sample Configuration

MDSTAT {
raid : noop
}

Arguments

Object : this is just a label used to identify the check. All MD devices are detected automatically.

NETBACKUPTAPE

This module checks the status of tape drives in netbackup, and will go critical if any are down, or there are no drives up.

Sample Configuration

NETBACKUPTAPE {
tapes : noop
}

NETSTAT

This module checks the output of netstat, as its name suggests. It can be used to ensure that a server is listening on a specified port, or that a certain connection is currently open.

Sample Configuration

Arguments

Object : the path to the file to be monitored

host : the hostname of the server

minimum : the minimum file size in bytes

maximum : the maximum file size in bytes

RESMON

This module monitors resmon itself and reports if there is a problem with the config file or if there are any failed modules. This is most useful in conjunction with auto updating, when modules are reloaded without restarting resmon.

Note: at some point, this module may be added by default, but at the moment it needs to be included in the config file.

This check will also report the subversion revision number if resmon is running from a checkout.

Sample Configuration

RESMON {
resmon : noop
}

SCRIPT

This module runs a perl script and expects some output from the script in the form of "STATUS(message)". This allows resmon to run helper scripts without needing to write a complete module.

Sample Configuration

SCRIPT {
myscript : script => /path/to/myscript.pl, timeout => 300
}

Arguments

Object : This is just a label used to identify the check

script : the path to the perl script

timeout : (optional, default 30) how long to cache the result of the command for in seconds

SIMPLESVN

This module, like the FRESHSVN module, checks for the health of a subversion checkout, making sure it is up to date and that there are no problems. It does not check that the working copy is checked out from a specific repository, nor does it have any grace period. However, it will work with older versions of subversion and may be preferable to the FRESHSVN module in some circumstances.

Sample Configuration

SIMPLESVN {
/path/to/working/copy : noop
}

Arguments

Object : the path to the working copy

SMFMAINTENANCE

This module checks for any solaris services in maintenance mode.

Sample Configuration

SMFMAINTENANCE {
services : noop
}

SWAPSIZE

This module monitors the memory used on Solaris by inspecting the usage of the /tmp directory.

Sample Configuration

SWAPSIZE {
swap : limit => 262144
}

Arguments

Object : this is just a label used to identify the check

limit : the minimum amount of free memory below which we go critical

TCPSERVICE

This module connects to a tcp service at regular intervals, going critical if the connection fails.

Sample Configuration

TCPSERVICE {
ssh : host => 127.0.0.1, port => 22, timeout => 2
}

Arguments

Object : this is just a label used to identify the check

host : the host to connect to

port : the port to connect to

timeout : how long to wait for a connection before going critical in seconds

prepost : (optional) a string to send on connection. Useful if the service you are checking requires something to be entered before showing a banner.

TWRAID

This module monitors the status of a 3ware RAID controller unit. It requires
that you have the tw_cli command installed, which is available from
http://www.3ware.com/ .

Sample Configuration

TWRAID {
/c0/u1 : tw_cli => /path/to/tw_cli
}

Arguments

Object : the unit you wish to monitor, this should be in the form /cx/ux (/c0/u1 is likely to be correct if only one unit is present)

tw_cli : (optional) the path to the tw_cli command. This defaults to /usr/local/bin/tw_cli if not present.

WALCHECK

This module monitors the postgresql log file replay from a master to a slave.

Sample Configuration

Arguments

ZIMBRA

This module checks zimbra's service status and goes critical if any services are down.

Sample Configuration

ZIMBRA {
services : noop
}

ZPOOLERRS

This module checks for zpool read write errors by using zpool status -x. It will also notify if a zpool is degraded or not, similar to the basic zpool check.

This check can be used either in combination with the ZPOOL check or instead of it. If used in combination, it is probably a good idea to warn or email when the ZPOOLERRS check goes bad, and page when the ZPOOL check goes bad. If you wish to page on read/write errors as well as degraded arrays, then only the ZPOOLERRS check is required.

Sample Configuration

ZPOOLERRS {
zpools : noop
}

ZPOOLERRS {
zpools : warn_on_upgrade => yes
}

Arguments

Object : this is just a label used to identify the check

warn_on_upgrade : when a zpool needs upgrading to a new zfs version, do we warn or stay OK?

ZPOOLFREE

This module monitors the free space in a zfs pool using the zfs list
command (the zpool list command can give misleading results when the
zpool is almost full).

Often, it is more informative to use this module rather than the DISK module
if your filesystems are all part of a zpool. Otherwise, what happens is that
when the disk is full, every filesystem based check goes to 100% full and it
isn't obvious what the cause is.

Sample Configuration

ZPOOLFREE {
pool1 : limit => 90%
pool2 : limit => 90%
}

Arguments

Object : the name of the zpool you wish to monitor

limit : how full to get before going critical

ZPOOL

This module looks for degraded zpools, but does not go critical if there are
any recoverable errors that do not cause the array to be degraded.