A System Monitoring Dashboard

This simple set of shell scripts keeps you informed about disks that are filling up, CPU-hog processes and problems with the Web and mail servers.

For about a year, my company had been struggling to roll out a monitoring
solution. False positives and
inaccurate after-hours pages were affecting morale and
wasting system administrators' time. After speaking to
some colleagues about what we really need to monitor,
it came down to a few things:

Web servers—by way of HTTP, not only physical servers.

Disk space.

SMTP servers' availability—by way of SMTP, not only physical
servers.

A history of these events to diagnose and pinpoint problems.

This article explains the process I developed and how I set up disk, Web and SMTP
monitoring both quickly and simply. Keeping the monitoring process
simple meant that all the tools used should be available on a recent Linux
distribution and should not use advanced protocols, such as SNMP or
database technology. As a result, all of my scripts use the Bash shell, basic
HTML, some modest Perl and the wget utility. All of these monitoring
scripts share the same general skeleton and installation steps, and they are available
from the Linux Journal FTP site (see the on-line Resources).

Installing the scripts involves several steps. Start by copying the
script to a Web server and making it world-executable with
chmod. Then, create a directory under the root of
your Web server where the script can write its logs and history. I used webmon for
monitor_web.sh. The other scripts are similar: I used smtpmon for
monitor_smtp.sh and stats for monitor_stats.pl. monitor_disk.sh is
different from the others because it is the only one installed locally
on each server you want to monitor.

Next, schedule the scripts in cron. You can run each script with any
user capable of running wget, df -k and top. The user also needs to have the
ability to write to the script's home. I suggest creating a local user
called monitor and scheduling these through that user's crontab.
Finally, install wget if it is not already present on your Linux distribution.

My first challenge was to monitor the Web servers by way of HTTP, so I
chose wget as the engine and scripted around it. The resulting script
is monitor_web.sh. For those unfamiliar with wget, its author describes
it as “a free software package for retrieving files using HTTP, HTTPS
and FTP, the most widely used Internet protocols” (see Resources).

After installation, monitor_web.sh requires only two choices for the
user, e-mail recipient and URLs to monitor, which are labeled clearly. The
URLs must conform to HTTP standards and return a valid http
200 OK string to work. They can be HTTP or HTTPS, as wget and monitor_web.sh
support both. Once installed and run the first time, the user is able to
get to http://localhost/webmon/webmon.html and view the URLs, the last
result and the history in a Web browser, as they all are links.

Now, let's break down the script; see monitor_web.sh, available on the
LJ FTP site. First, I set all the
variables for system utilities and the wget program. These may change on
your system. Next, we make sure we are on the network. This ensures that if
the server monitoring the URLs goes off-line, a massive number of alerts
are not queued up by Sendmail until the server is back on-line.

As I loop through all the URLs, I have wget connect two times with a
timeout of five seconds. I do this twice to reduce false positives. If
the Web site is down, the script generates an e-mail message for the recipient and
updates the Web page. Mail also is sent when the site is back up. The
script sends only one message, so we don't overwhelm the recipient. This
is achieved with the following code:

I have included the HTML for green and red text
in the script, if you choose not to use graphics.
Again, the full script is available from the Linux Journal FTP site.

Figure 1. monitor_web.sh in action. Run the script from
cron to regenerate this page as often as needed.

With the Web servers taken care of, it was time to tackle disk
monitoring. True to our keep-it-simple philosophy, I chose to
create a script that would run from cron and alert my team based on the
output of df -k. The result was monitor_disk.sh. The first real block
of code in the script sets up the filesystems list:

I ignore proc and am careful not to report on the CD-ROM, should my
teammates put a disk in the drive. The script then compares the value of
Use% to two values, THRESHOLD_MAX and THRESHOLD_WARN. If Use% exceeds
either one, the script generates an e-mail to the appropriate recipient,
RECIPIENT_MAX or RECIPIENT_WARN. Notice that I made sure the Use% value for each
filesystem is interpreted as an integer with this line:

A mailing list was set up with my team members' e-mail addresses and the
e-mail address of the on-call person to receive the critical e-mails and
pages. You may need to do the same with your mail
server software, or you simply can use your group or pager
as both addresses.

Because our filesystems tend to be large, about 72GB–140GB, I have set
critical alerts to 95%, so we still have some time to address issues
when alerted. You can set your own threshold with the THRESHOLD_MAX
and THRESHOLD_WARN variables. Also, our database servers run some
disk-intensive jobs and can generate large amounts of archive log files,
so I figured every 15 minutes is a good frequency at which to monitor. For
servers with less active filesystems, once an hour is enough.

Our third script, monitor_smtp.sh, monitors our SMTP servers'
ability to send mail. It is similar to the first two scripts
and simply was a matter of finding a way to connect directly to a user-defined
SMTP server so I could loop through a server list and send a
piece of mail. This is where smtp.pl comes in. It is a Perl script
(Listing 1) that uses the NET::SMTP module to send mail
to an SMTP address. Most recent distributions have this module
installed already (see the Do I Have That Perl Module Installed sidebar). Monitor_smtp.sh updates the defined
Web page based on the success of
the transmission carried out by smtp.pl. No attempt is made to alert our
group, as this is a trouble-shooting tool and ironically cannot rely on
SMTP to send mail if a server is down. Future versions of monitor_smtp.sh
may include a round-robin feature and be able to send an alert through
a known working SMTP server.

We have also scripts for monitoring a web server. Additionally to know that the web service is up and running we were interested in knowing that the page haven't had changes, so we did this:

- Precalculated and stored de md5 from the page we want to check
- Every few minutes (crontab line), get the page and calculate the md5 for the page we get and compare it with the precalculated. If they are different, there is a problem.

Trending Topics

Webinar: 8 Signs You’re Beyond Cron

Scheduling Crontabs With an Enterprise Scheduler
11am CDT, April 29th

Join Linux Journal and Pat Cameron, Director of Automation Technology at HelpSystems, as they discuss the eight primary advantages of moving beyond cron job scheduling. In this webinar, you’ll learn about integrating cron with an enterprise scheduler.