3.4. Monitoring Plugins

This section describes the different services that can be
monitored (for example, a MySQL database or an Apache
webserver). It gives brief introductions to which services
the plugins can monitor and how they can be configured.
Wherever possible, sensible defaults are available so often
little or no configuration is required for common deployment
scenarios.

The available monitoring plugins depend on which plugins have
been built and installed. If you have received this document
as part of a binary distribution, it is possible that the
distribution does not include all the plugins described here.
It might also contain other plugins provided independently
from the main MonAMI release.

3.4.1. AMGA

AMGA (ARDA Metadata Catalogue Project) is a metadata server
provided by the ARDA/EGEE project as part of their gLite software
releases. It provides additional metadata functionality by
wrapping an underlying database storage. More information about
AMGA is available from the AMGA
project page.

The amga monitoring plugin will monitor the server's
database connection usage and the number of incoming connections.
For both, the current value and configured maximum permitted are
monitored.

Attributes

host string, optional

the host on which the AMGA server is running. The default
value is localhost.

port integer, optional

the port on which the AMGA server listens. The
default value is 8822.

3.4.2. Apache

The Apache HTTP (or web) server is perhaps the most well known
project from the Apache Software Foundation. Since April 1996,
the Netcraft web survey has shown it to be the most popular on the
Internet. More details can be found at the Apache home
page.

The apache plugin monitors the current status of an
Apache HTTP server. It can also provide event-based monitoring,
based on various log files.

The Apache server monitoring is achieved by downloading the
server-status page (provided by the mod_status Apache plugin) and
parsing the output. Usually, this option is available within the
Apache configuration, but commented-out by default (depending on
the distribution). The location of the Apache configuration is
Apache-version and OS specific, but is usually found in either the
/etc/apache, /etc/apache2 or /etc/httpd directory. To enable the
server-status page, uncomment the section or add lines within the
apache configuration that look like:

<Location /server-status>
SetHandler server-status
Order deny,allow
Deny from all
Allow from .example.com
</Location>

Here .example.com is an illustration of how to limit
access to this page. You should change this to either your DNS
domain or explicitly to the machine on which you are to run
MonAMI.

There is an ExtendedStatus option that configures Apache to
include some additional information. This is controlled within
the Apache configuration by lines similar to:

<IfModule mod_status.c>
ExtendedStatus On
</IfModule>

Switching on the extended status should not greatly affect the
server's load and provides some additional information. MonAMI
can understand this extra information, so it is recommended to
switch on this ExtendedStatus option.

Event-based
monitoring

Event-based monitoring is made available by watching log files.
Any time the Apache server writes to a watched log file, an event
is generated. The plugin supports multiple event channels,
allowing support for multi-homed servers that log events to
different log files.

Event channels are specified by log
attributes. This can be repeated to configure multiple event
channels. Each log attribute has a
corresponding value like:

name:path[type]

where:

name

is an arbitrary name given to this channel. It cannot have
a colon (:) and should not have a dot
(.) but most names are valid.

path

is the path to the file. Log rotations (where a log file is
archived and a new one created) are supported.

type

is either combined, or error.

The following example configures the access
channel to read the log file
/var/log/apache2/access.log, which is in the
Apache standard “combined” format.

[apache]
log = access: /var/log/apache2/access.log [combined]

Attributes

host string, optional

the hostname for webserver to monitor. The default value is
localhost.

3.4.3. dCache

dCache (see dCache
home page) is a system jointly developed by
Deutsches Elektronen-Synchrotron (DESY) and Fermilab that aims to
provide a mechanism for storing and retrieving huge amounts of
data among a large number of heterogeneous server nodes, which can
be of varying architectures (x86, ia32, ia64). It provides a
single namespace view of all of the files that it manages and
allows access to these files using a variety of protocols,
including SRM, GridFTP, dCap and xroot. By connecting dCache to a
tape storage backend, it becomes a hierarchical storage manager
(HSM).

Authentication

The dCache monitoring plugin works by connecting to the underlying
PostGreSQL database that dCache uses to store the current system
state. To achieve this, MonAMI must have the credentials (a
username and password) to log into the database and perform read
queries.

If you do not already have a read-only account, you will need to
create such an account. It is strongly recommended not to use an
account with any write privileges as the password will be stored
plain-text within the MonAMI configuration file (see Section 4.2.2, “Passwords being stored insecurely”).

To configure PostGreSQL, SQL commands need to be sent to the
database server. To achieve this, you will need to use the
psql command, connecting to the
dcache database. On many systems you must
log in as the database user “postgres”, which often has no
password when connecting from the same machine on which database
server is running. A suitable command is:

psql -U postgres -d dcache

The following SQL commands will create an account
monami with password
monami-secret that has
read-only access to the tables that MonAMI will read.

If you intend to monitor the database remotely, you may need to
add an extra entry in PostGreSQL's remote access file:
pg_hba.conf. With some distribution, this
file is located in the directory
/var/lib/pgsql/data.

Currently, the information gathered is limited to the rate of SRM
GET, PUT and COPY requests received. This information is gathered
from the copyfilerequests_b,
getfilerequests_b and putfilerequests_b tables. Future
versions of MonAMI may read other tables, so requiring additional
GRANT statements.

Attributes

host string, optional

the host on which the PostGreSQL database is running. The
default is localhost.

ipaddr string, optional

the IP address of the host on which the database is running.
This is useful when the host is on multiple IP subnets and a
specific one must be used. The default is to look up the IP
address from the host.

port integer, optional

the TCP port to use when connecting to the database. The
default is port 5432 (the standard PostGreSQL port).

user string, optional

the username to use when connecting to the database. The
default is the username of the system account MonAMI is
running under. When running as a daemon from a standard
RPM-based installation, the default user will be monami.

password string, optional

the password to use when authenticating. The default is to
attempt password-less login to the database.

3.4.4. Disk Pool Manager (DPM)

Disk Pool Manager (DPM) is a service that implements the SRM
protocol (mainly for remote access) and rfio protocol (for
site-local access). It is an easy-to-deploy solution that can
support multiple disk servers but has no support for
tape/mass-storage systems. More information on DPM can be found
at the DPM
home page.

Figure 3.1. Data from DPM displayed within Ganglia.

The dpm plugin connects to the MySQL server DPM uses. By
querying this database, information is extracted such as the
status of the filesystems and the used and available space. The
space statistics are available as a summary, and broken down for
each group, and for each filesystem. The daemon activity on the
head node can also be monitored.

Authentication

This plugin requires read-only privileges for the database DPM
uses. The following set of SQL statements creates login
credentials with username of
monamiuser and password of
monamipass suitable for
local access:

GRANT SELECT ON cns_db.* TO 'monamiuser'@'localhost'
IDENTIFIED BY 'monamipass';
GRANT SELECT ON dpm_db.* TO 'monamiuser'@'localhost'
IDENTIFIED BY 'monamipass';

If MonAMI is to monitor the MySQL database remotely, the following
SQL can be used to create login credentials

GRANT SELECT ON cns_db.* TO 'monamiuser'@'%'
IDENTIFIED BY 'monamipass';
GRANT SELECT ON dpm_db.* TO 'monamiuser'@'%'
IDENTIFIED BY 'monamipass';

If local and remote access to the MonAMI database is needed all
four above SQL commands should be combined.

Attributes

host string, optional

the host on which the MySQL server is running. Default is
localhost.

user string, required

the username with which to log into the server.

password string, required

the password with which to log into the server.

3.4.5. Filesystem

The filesystem plugin monitors generic (i.e.,
non-filesystem specific) features of a mounted filesystem. It
reports both capacity and “file” statistics. The
“file” statistics correspond to inode usage for
filesystems that use inodes (such as ext2).

Note

With both reported resources (blocks and files), there are
similar-sounding metrics: “free” and
“available”. “free” refers to total
resource potentially available and “available”
refers to the resource available to general (non-root) users.

The difference between the two comes about because it is common
to reserve some capacity for the root user. This allows core
system services to continue when a partition is full: normal
users cannot create files but root (and processes running as
root) can.

Attributes

location string, required

the absolute path to any file on the filesystem.

3.4.6. GridFTP

The Globus Alliance distribute a modified version of the WU-FTP
client that has been patched to allow GSI-based authentication and
multiple streams. This is often referred to as
“GridFTP”.

Various grid components use GridFTP as an underlying transfer
mechanism. Often, these have the same log-file format for
recording transfers, so parsing this log-file is a common
requirement.

The gridftp plugin monitors GridFTP log files, providing
an event for each transfer. This is under the
transfers channel.

Attributes

filename string, required

the absolute path to the GridFTP log file.

3.4.7. Maui

On their website, Cluster Resources describe Maui as “an
advanced batch scheduler with a large feature set well suited for
high performance computing (HPC) platforms”. Within a
cluster it is used to decide which job (of many that are
available) should be run next. Maui provides sophisticated
scheduling features such as advanced fair-share definitions and
“allocation bank”. More details are available within
the Maui
homepage.

Access control

The MonAMI maui plugin will need sufficient access rights
to query the Maui server. If MonAMI is running on the same
machine as the Maui server, (most likely) no additional host will
be needed. If MonAMI is running on a remote machine, then
access-right must be granted for that machine. Append the remote
host's hostname to the space-separated ADMINHOST
list.

The plugin will also need to use a valid username. By default it
will use the name of the user it is running as (monami),
but the plugin can use an alternative username (see the
user attribute). To add an additional
username, append the username to the space-separated
ADMIN3 list.

The following example configuration shows how to configure Maui to
allow monitoring from host
monami.example.org as user
monami.

Password

The Maui authenticates by the client and server keeping a shared
secret: a password. Currently this password must be integer
number. Unfortunately, the password is decided as part of the
Maui build process. If one is not explicitly specified, a random
number is selected as the password. The password is then embedded
within the Maui client programs and used when they communicate
with the Maui server. Currently, it is not possible to configure
the Maui server to use an alternative password without rebuilding
the Maui client and servers.

To communicate with the Maui server the maui plugin must
know the password. Unfortunately, as the password is only stored
within the executables, it is difficult to discover. The
maui plugin has heuristics that allow it to scan a Maui
client program and, in most cases, discover the password. This
requires a Maui client program to be present on whichever computer
MonAMI is running. If the Maui client is in a non-standard
location, its absolute path can be specified with the
exec attribute.

If the password is known (for example, its value was specified
when compiling Maui) then it can be specified using the
password attribute. Specifying the
password attribute will stop MonAMI from
scanning Maui client programs.

Once the password is known, it can be stored in the MonAMI
configuration using the password attribute.
This removes the need for a Maui client program. However, should
the Maui binaries change (for example, upgrading an installed Maui
package), it is likely that the password will also change. This
would stop the MonAMI plugin from working until the new password
was supplied.

The recommended deployment strategy is to install MonAMI on the
Maui server and allow the maui plugin to scan the Maui
client programs for the required password.

Time synchronisation

When communicating between the maui and Maui server, both
parties want to know that the messages are really from the other
party. The shared-secret is one part of this process, another is
to check the time within the message. This is to prevent a
malicious third-party from sending messages that have already been
sent: a “replay attack”.

To prevent these replay attacks, the clocks on the Maui server and
the server MonAMI is running must agree. If both machines are
well configured, their clocks will agree with ~10 millisecond
difference. Since the network may introduce a slight delay, some
tolerance is needed.

The maui plugin requires an agreement of one second by
default. This should be easy to satisfied with modern networks.
If, for whatever reason, this is not possible the tolerance can be
make more lax by specifying the
max_time_delta attribute.

Note

Should there be a systematic error between the clocks on two
servers, effort should be made in synchronosing those clocks.
Increasing the max_time_delta makes
MonAMI more vulnerable to replay attacks.

Attributes

host string, optional

the hostname of the Maui server. If not specified,
localhost will be used.

port integer, optional

the TCP port to which the plugin with connect. If not
specified, the default value is 40559.

user string, optional

the user name to present to the Maui server when
communicating. The default value is the name of the account
under which MonAMI is running.

max_time_delta integer, optional

the maximum allowed time difference, in seconds, between the
server and client. The default value is one second.

password integer, optional

the shared-secret between this plugin and the Maui server.
The default policy is to attempt to discover the password
automatically. Specifying the password will prevent
attempts at discovering it automatically.

timeout string, optional

the time MonAMI should wait for a reply. The string is in
time-interval format (e.g., “5m
10s” is five minutes and ten seconds;
“310” would be equivalent).
The default behaviour is to wait indefinitely.

exec string, optional

the absolute path to the mclient (or
similar) Maui client program. If the plugin was
unsuccessful scanning the program given by exec
it will also try standard locations.

3.4.8. MySQL

This plugin monitors the performance of a MySQL database.
MySQL is a commonly used Free (GPLed)
database. The parent company (MySQL AB) describe it as “the
world's most popular open source database”. For more
information, please see the MySQL home
page

The statistics monitored are taken from the status variables.
They are acquired by executing the MySQL SQLSHOW
STATUS;. The raw variables are described in the MySQL
manual, section 5.2.5: Status
Variables.

Note

The metrics names provided by MySQL are in a flat namespace.
These names are not used by MonAMI;
instead, the metrics are mapped into a tree structure, allowing
more easy navigation of, and section from, the available
metrics.

Privileges

To function, this plugin requires an account to access the
database. Please note: this database account requires no database
access privileges, only that the username and password will allow
MonAMI to connect to the MySQL database. For security
considerations, you should not employ login
credentials used elsewhere (and never root or similar
power-user). The following is a suitable SQL statement for
creating a username and password of
monami and
monamipass.

CREATE USER 'monami'@'localhost' IDENTIFIED BY "monamipass";

Sharing login credentials is not recommended. If you decide to
share credentials make sure the MonAMI configuration file is
readable only by the monami user (see Section 3.2.2, “Dropping root
privileges”).

Note

In addition to monitoring a MySQL database, the mysql
plugin can also store information MonAMI has gathered within
MySQL. This is described in Section 3.5.8, “MySQL”.

Attributes

user string, required

the username with which to log into the server.

password string, required

the password with which to log into the server

host string, optional

the host on which the MySQL server is running. If no host
is specified, the default localhost is used.

3.4.9. null

The null plugin is perhaps the simplest to understand.
As a monitoring plugin, it providing an empty datatree when
requested for data. The main use for null as a
monitoring target is to demonstrating aspects of MonAMI without
the distraction of real-life effects from other monitoring
plugins.

The null plugin will supply an empty datatree. In
conjunction with a reporting plugin (e.g., the snapshot), this can be used to
demonstrate the map attribute for adding
static content. This attribute is described in Section 3.3.3, “The map attribute”.

Delays

Another use for a null target is to investigate the
effect of a service taking a variable length of time to respond
with monitoring data. This is emulated by specifying a delay
file. If the delayfile attribute is set,
then the corresponding file is read. It should contain a single
integer number. This number dictates how long (in seconds) a
null target should wait when requested for data. The
file can be changed at any time and the change will affect the
next time the null target is read from. This is
particularly useful for demonstrating how MonAMI estimates future
delays (see Section 3.3.4, “Estimating future data-gathering delays”) and undertakes adaptive
monitoring (see Section 3.6.4, “Adaptive monitoring”).

Then, by changing the number stored in
/tmp/monami-delay, the delay can be adjusted
dynamically. To set the delay to three seconds, do:

$ echo 3 > /tmp/monami-delay

To remove the delay, simply set the delay to zero:

$ echo 0 > /tmp/monami-delay

Attributes

delayfile string, optional

the filename of the delay file, the contents of which is
parsed as an integer number. This number is the number of
seconds the null target will delay when replying
with an empty datatree.

3.4.10. NUT

Network UPS Tools (NUT) provides a standard method through which
an Uninterruptable Power Supply (UPS) can be
monitored. Part of this framework allows for signalling, so that
machines can undergo a controlled shutdown in the event of a power
failure. Further details of NUT are available from the NUT home
page.

The MonAMI nut plugin connects to the NUT data
aggregator daemon (upsd) and queries the status of all known,
attached UPS devices. The ups.conf file must
be configured for available hardware and the startup scripts must
be configured to start the required UPS-specific monitoring
daemons.

By default, localhost will be allowed access to the upsd
daemon but access for external hosts must be added explicitly in
the upsd.conf file. See the NUT
documentation on how best to achieve this.

Attributes

host string, optional

the host on which the NUTupsd daemon is running. The
default value is localhost.

port integer, optional

the port on which the NUTupsd daemon listens. The
default value is 3493.

3.4.11. Process

The process plugin monitors Unix processes. It can count
the number of processes that match search criteria and can give
detailed information on a specific process.

The information process gives should not be confused with
any process, memory or thread statistics other monitoring plugins
provide. Some services report their current thread, process or
memory usage, which may duplicate some of the information this
plugin reports (see, for example, Section 3.4.2, “Apache” and
Section 3.4.8, “MySQL”). However, process reports
information from the kernel and should work with any application.

The process plugin has two main types of monitors:
counting processes and detailed information about a single
process. A single process target can be configured to do
any number of either type of monitoring and the results are
combined in the resulting datatree.

Counting
processes

To count the number of processes, a count
attribute must be specified. In its simplest form, the
count attribute value is simply the name of
the process to count. The following example reports the number of
imapd processes that are currently in existance.

[process]
count = imapd

The format of the count attribute allows
for more sophisticated queries of form:
reported name :
proc name
[cond1,
cond2, ...]

All of the parts are optional: the part upto and including the
colon
(reported name :),
the part after the colon but before the square brackets
(proc name) and the
part in square brackets ([cond1,
cond2, ...]) can be omitted, but
at least one of the first two parts must be specified. The
examples below may help clarify this!

To be included in the count, a process' name must match the
proc name (if
specified). The statistics will be reported as
reported name. If no
reporting name is specified, then
proc name will be
used.

The part in square brackets, if present, specifies some additional
constraints. The comma-separated list of key, value pairs define
additional predicates; for example, [uid=root,
state=R] means only processes that are running as
root and are in state running will be counted. The valid
conditions are:

uid = uid

to be considered, the process must be running with a user ID
of uid. The value
may be the numerical uid or the username.

gid = gid

the process must be running with a group ID of
gid. The value may
be the numerical gid or the group name.

state = statelist

the process must have one of the states listed in
statelist. Each
acceptable process state is represented by a single capital
letter and they are concatinated together. Valid process
states letters are:

R

process is running (or ready to be run),

S

sleeping, awaiting some external event,

D

in uninterruptable sleep (typically waiting for disk
IO to complete),

T

stopped (due to being traced),

W

paging,

X

dead,

Z

defunct (or "zombie" state).

The following example illustrates count
used to count the number of processes. The different attributes
show how the different criteria are represented.

Count the number of processes running as root. Store
the number as a metric called
run_as_root.

Detailed
information

The watch attribute specifies a process to
monitor in detail. The process to watch is identified using the
same format as with count statements;
however, the expectation is that only a single process will match
the criteria.

If there is more than one process matching the search criteria
then one is chosen and that process is reported. In principle,
the selected process might change from one time to the next, which
would lead to confusing results. In practise, the process with
the lowest pid is chosen, so is both likely to be the oldest
process and unlikely to change over time. However, this behaviour
is not guaranteed.

Much information is gathered with a watch
attribute. This information is documented in the
stat and status sections
of the proc(5) manual page. Some of the
more useful entries are copied below:

pid

the process ID the the process being monitored.

ppid

the process ID of the parent process.

state

a single character, with the same semantics as the different
process states listed above.

number of jiffies[1]
of time spent with this process scheduled in user-mode.

stime

number of jiffies[1] of time
spent with this process scheduled in kernel-mode.

threads

number of threads in use by this process.

Note

An accurate value is provided by the 2.6-series kernels.
Under 2.4-series kernel with LinuxThreads, heuristics are
used to derive a value. This value should be correct
under most circumstances, but it may be confused if
multiple instances of the same multi-threaded process is
running concurrently.

vsize

virtual memory size: total memory used by the process.

rss

Resident Set Size: number of pages of physical memory a
process is using (less 3 for administrative bookkeeping).

Attributes

count string, optional

either the name of the process(es) to count, or the
conditions processes must satisfy to be included in the
count. This attribute may be repeated for multiple process
counting.

count attributes have the form:
reported name :
proc name
[cond1,
cond2, ...]

watch string, optional

either the name of the process to obtain detailed
information, or the conditions a process must satisfy to be
watched. This attribute may be repeated to obtain detailed
information about multiple processes.

watch attributes have the form:
reported name :
proc name
[cond1,
cond2, ...]

3.4.12. Stocks

The stocks plugin uses one of the web-services provided
by XMethods
to obtain a near real-time quote (delayed by 20 minutes) for one
or more stocks on the United States Stock market. Further details
of this service are available from the Stocks
service summary page.

In addition to providing financial information, stocks is
a pedagogical example that demonstrates the use of SOAP within
MonAMI.

Caution

The authors of MonAMI expressly disclaim the accuracy, adequacy,
or completeness of any data and shall not be liable for any
errors, omissions or other defects in, delays or interruptions
in such data, or for any actions taken in reliance thereon.

Please do not send too many requests. A request every couple of
minutes should be sufficient.

Attributes

symbols string, required

a comma- (or space-) separated list of ticker symbols to
monitor. For example, GOOG is the symbol for
Google Inc. and RHT is the symbol for RedHat
Inc.

3.4.13. TCP

The tcp monitoring plugin provides information about the
number of TCP sockets in a particular state. Here, a socket is
either a TCP connection to some machine or the ability to
receive a particular connection (i.e., that the local machine is
“listening” for incoming connections).

A tcp monitoring target takes an arbitrary number of
count attributes. The value of a
count attributes describes how to report
the number of matching sockets and the criteria for including a
socket within that count. These attributes take values like:
name
[cond1,
cond2, ...], where
name is the name used to
report the number of matching TCP sockets. The conditions
(cond1,
cond2, etc.) are
comma-separated keyword-value pairs (e.g.,
state=ESTABLISHED). A socket must match all
conditions to be included in the count.

The condition keywords may be any of the following:

local_addr

The local IP address to which the socket is bound. This
may be useful on multi-homed machines for sockets bound to a
single interface.

remote_addr

The remote IP address of the socket, if connected.

local_port

The port on the local machine. This can be the numerical
value or a common name for the port, as defined in
/etc/service.

remote_port

The port on the remote machine, if connected. This can be
the numerical value or a common name for the port.

port

A socket's local or remote port must match. This can be the
numerical value or a common name for the port.

state

The current state of the socket. Each local socket will be
in one of a number of states and changes state during the
lifetime of a connection. All the states listed below are
valid and may occur naturally on a working system; however,
under normal circumstances some states are transitory: one
would not expect a socket to stay in a transitory state for
long. A large and/or increasing number of sockets in one of
these transitory states might indicate a networking problem
somewhere.

The valid states are listed below. For each state, a brief
description is given and the possible subsequent states are
listed.

LISTEN

A program has indicated it will receive connections
from remote sites.

Next: SYN_RECV, SYN_SENT

SYN_SENT

Either a program on the local machine is the client
and is attempting to connect to remote machine, or the
local machine sends data from a
LISTENing socket (less likely).

Next: ESTABLISHED, SYN_RECV or CLOSED

SYN_RECV

Either a LISTENing socket has received an incoming
request to establish a connection, or both the local
and remote machines are attempting to connect at the
same time (less likely)

Next: ESTABLISHED, FIN_WAIT_1 or CLOSED

ESTABLISHED

Data can be sent to/from local and remote site.

Next: FIN_WAIT_1 or CLOSE_WAIT

FIN_WAIT_1

Start of an active close. The
application on local machine has closed the
connection. Indication of this has been sent to the
remote machine.

Next: FIN_WAIT_2, CLOSING or TIME_WAIT

FIN_WAIT_2

Remote machine has acknowledged that local application
has closed the connection.

Next: TIME_WAIT

CLOSING

Both local and remote applications have closed their
connections “simultaneously”, but remote
machine has not yet acknowledged that the local
application has closed the local connection.

Next: TIME_WAIT

TIME_WAIT

Local connection is closed and we know the remote site
knows this. We know the remote site's connection is
closed, but we don't know if the remote site know that
we know this. (It is possible that the last ACK
packet was lost and, after a timeout, the remote site
will retransmit the final FIN packet.)

To prevent the potential packet loss (of the local
machine's final ACK) from accidentally closing a
fresh connection, the socket will stay in this state
for twice MSL timeout (depending on implementation, a
minute or so).

Next: CLOSED

CLOSE_WAIT

The start of a passive close.
The application on the remote machine has closed its
end of the connection. The local application has not
yet closed this end of the connection.

Next: LASK_ACK

LASK_ACK

Local application has closed its end of the
connection. This has been sent to the remote machine
but the remote machine has not yet acknowledged this.

Next: CLOSED

CLOSED

The socket is not in use.

Next: LISTEN or SYN_SENT

CONNECTING

A pseudo state. The transitory states when starting a
connection match, specifically either SYN_SENT or
SYN_RECV.

DISCONNECTING

A pseudo state. The transitory states when shutting
down a connection match, specifically any of
FIN_WAIT_1, FIN_WAIT_2, CLOSING, TIME_WAIT,
CLOSE_WAIT or LASK_ACK match.

The states ESTABLISHED and LISTEN are long-lived states. It
is natural to find sockets that are in these states for extended
periods.

For applications that use “half-closed” connections,
the FIN_WAIT_2 and TIME_WAIT states are less transitory. As
the name suggests, half-closed connections allows data to flow in
one direction only. It is achieved by the application that no
longer wishes to send data closing their connection (see
FIN_WAIT_1 above), whilst the application wishing to continue
sending data does nothing (and so suffers a passive close). Once
the half-closed connection is established, the active close socket
(which can no longer send data) will be in FIN_WAIT_2, whilst
the passive close socket (which can still send data) will be in
CLOSE_WAIT.

There are two pseudo states for the normal transitory states:
CONNECTING and DISCONNECTING. They are intended to help catch
networking or software problems.

The following example checks whether an application is listening
on three well-known port numbers. This might be used as a check
whether services are running as expected.

The following example records the number of connections to a
webserver. The established metric records the
connections where data may flow in either direction. The other
two metrics record connections in the two pseudo states. Normal
traffic should not stay long in these pseudo states; connections
that persist in these states may be symptomatic of some problem.

Attributes

count string, optional

the name to report for this metric followed by square
brackets containing a comma-separated list of conditions a
socket must satisfy to be included in the count. This
option may be repeated for multiple TCP connection counts.

The conditions are keyword-value pairs, separated by
=, with the following valid keywords:
local_addr,
remote_addr,
local_port,
remote_port, port,
state.

The state keyword can have one of the
following TCP states: LISTEN, SYN_RECV, SYN_SENT,
ESTABLISHED, CLOSED, FIN_WAIT_1, FIN_WAIT_2,
CLOSE_WAIT, CLOSING, TIME_WAIT, LASK_ACK; or one of
the following two pseudo states: CONNECTING, DISCONNECTING.

3.4.14. Tomcat

Apache Tomcat is one of the projects from the Apache Software
Foundation. It is a Java-based application server (or servlet
container) based on Java Servlet and JavaServer Pages
technologies. Servlets and JSP are defined under Sun's Java
Community Process. More information about Tomcat can be found
at the Apache
Tomcat home page.

Also under development of the Java Community Process is the Java
Monitoring eXtensions (JMX). JMX provides a standard method
of instrumenting servlets and JSPs, allowing remote monitoring
and control of Java applications and servlets.

The tomcat plugin uses the JMX-proxy servlet to monitor
(potentially) arbitrary aspects of a Servlet and JSPs. This
provides structured plain-text output from Tomcat's JMX MBean
interface. Applications that require monitoring should connect to
that interface for MonAMI to discover their data.

To monitor a custom servlet, the required instrumentation within
the servlet/JSP must be written. Currently, there is an
additional light-weight conversion needed within MonAMI, adding
some extra information about the monitored data. Sample code
exists that monitors aspects of the Tomcat server itself.

Any tomcat monitoring target will need a username and
password that matches a valid account within the Tomcat server
that has the manager
role. This is normally configured in the file
$CATALINA_HOME/conf/tomcat-users.xml.
Including the following line within this file creates a new user
monami, with password
monami-secret and
manager role, to
Tomcat.

<user username="monami" password="monami-secret" roles="manager"/>

This line should be added within the
<tomcat-users> context.

Warning

Be aware that Basic authentication sends the username and
password unencrypted over the network. These values are at risk
if packets can be captured. If you are not sure, you should run
MonAMI on the same server as Tomcat.

In addition to connecting to Tomcat, you also need to specify
which classes of information you wish to monitor. The following
are available: ThreadPool and Connector. To monitor some aspect,
you must specify the object type along with the identifier for
that object within the monitoring definition. For example:

[tomcat]
name = local-tomcat
ThreadPool = http-8080
Connector = 8080

ThreadPool monitors a named thread pool (e.g.,
http-8080), monitoring the following
quantities:

minSpareThreads

the minimum number of threads the server will maintain.

currentThreadsBusy

the number of threads that are either actively processing a
request or waiting for input.

currentThreadCount

total number of threads within this ThreadPool.

maxSpareThreads

if the number of spare threads exceeds this value, the
excess are deleted.

maxThreads

an absolute maximum number of threads.

threadPriority

the priority at which the threads run.

The Connector monitors a ConnectorMBean and is identified by which
port it listens on. It monitors the following quantities:

allowTrace

Can we trace the output?

clientAuth

Did the client authenticate?

compression

Is the connection compressed?

disableUploadTimeout

Is the upload timeout disabled?

emptySessionPath

Is there no session?

enableLookups

Are lookups enabled?

tcpNoDelay

Is the TCPSO_NODELAY flag set?

useBodyEncodingForURI

does the URI contain body information?

secure

are the connections secure?

acceptCount

number of pending connections this Connector will accept
before rejecting incoming connections.

bufferSize

size of the input buffer.

connectionLinger

how long the connection lingers, waiting for other
connections.

connectionTimeout

the timeout for this connection.

connectionUploadTimeout

the timeout for uploads.

maxHttpHeaderSize

the maximum size for HTTP header.

maxKeepAliveRequests

how many keep-alives before the connection is considered
dead.

maxPostSize

maximum size of the information POSTed.

maxSpareThreads

c.f. ThreadPool

maxThreads

c.f. ThreadPool

minSpareThreads

c.f. ThreadPool

threadPriority

c.f. ThreadPool

port

the port on which this connector listens.

poxyPort

the proxy port associated with this connector.

redirectPort

the port to which this connector will redirect.

protocol

which protocol the connector uses
(e.g., HTTP/1.1)

sslProtocol

the SSL protocol the connector uses (e.g.,
TLS)

scheme

which scheme the URI will use (e.g.,
http, https)

Attributes

The tomcat monitoring target accepts the following
options:

host string, optional

the hostname of the machine to monitor. The default value
is localhost.

port integer, optional

the TCP port on which Tomcat listens. The default value
is 8080

jmxpath string, optional

the path to the JMX-proxy servlet within the application
server URI namespace. The default path is /manager/jmxproxy/

username string, optional

the username to use when completing Basic authentication.

password string, optional

the password to use when completing Basic authentication.

3.4.15. Torque

The Torque
homepage describes Torque as “an open
source resource manager providing control over batch jobs and
distributed compute nodes.” Torque was based on the
original PBS/Open-PBS project, but incorporates many new features.
It is now a widely used batch control system.

Torque is heavily influenced by the IEEE 1003.1 specification,
in particular Section
3 (Batch Evironment Services) of the Shell
& Utilities volume. However, it also includes some additional
features, such as support for jobs in the suspended state.

Access control

Torque uses username-and-host based authorisation. Users may
query the status of their own jobs, but may require special
privileges to view the status of all jobs. Because of this, the
MonAMI torque plugin may require authorisation to gather
monitoring information.

To grant torque sufficient privileges to conduct its
monitoring, the Torque server must have either
query_other_jobs set to True
(allowing all users to see other user's job information) or have
the MonAMI user (typically monami) and host added as one of
the operators. Setting either option is
sufficient and both can be achieved using the
qmgr command.

The command qmgr -ac "list server
query_other_jobs" will display the current value of
query_other_jobs. To allow all users to see
other user's job status, run the command: qmgr -ac "set
server query_other_jobs = True".

The command qmgr -ac "list server operators"
will display the current list of operators. To add user
monami running on host mon-hq.example.org as another
operator, use the command qmgr -ac "set server operators
+=
monami@mon-hq.example.org".

Queue groups

It is often useful to group together multiple execution queues
when generating statistics. The group may represent queues with a
similar purpose, or the group represents a set of queues that
support a wider community. MonAMI supports this by allowing the
definition of queue-groups and will report statistics for each of
these groups.

A queue-group is defined by including a
group attribute in the torque
target. Multiple groups can be defined by repeating the
group attributes, one attribute for each
group.

A group attribute's value defines the group
like: name :
queue1,
queue2, ..., where
name is the name of
the queue-group and
queue1 is the first
queue to be included,
queue2 the second,
and so on. The group statistics are generated based on all jobs
that have any of the listed execution queues.

As an example, the following torque stanza defines four
groups: HEP, LHC,
Grid OPS, and Local.

Attributes

host string, optional

the hostname of the Torque server. If not specified, a
default value will be used, which is specified externally to
MonAMI. This default may be localhost or may be
configured to whatever is the most appropriate Torque
server.

group string, optional

defines a new queue-group that statistics are collected
against. The group value is like:
name :
queue1,
queue2, .... Each
Torque queue may appear in any number (zero or more) of
queue-group definitions.

3.4.16. Varnish

The Varnish home
page describes Varnish as a
“state-of-the-art, high-performance HTTP
accelerator. Varnish is targeted primarily at the FreeBSD 6/7
and Linux 2.6 platforms, and takes full advantage of the virtual
memory system and advanced I/O features offered by these operating
systems.”

Varnish offers a management interface. The MonAMI
varnish plugin connects to this this interface and
request the server's current set of statistics.

Attributes

host string, optional

the host on which Varnish is running. Default is
localhost.

port integer, optional

the TCP port on which the Varnish management interface
is listening. The default value is 6082.

[1]
a jiffy is hard-coded period of time. On most Linux
machines, it is 10ms (1/100s). It can be altered to some
different value, but it remains constant whilst the kernel
is running. In practise, the number of jiffies since the
machine booted is held as a counter, which is incremented
when the timer interrupt occurs.