Poor Sysadmin's Guide to Remote Linux Administration

The Problem

Like many free software geeks, I run a one-person Web hosting shop,
a combination business, hobby, and community service. I've become
accustomed to doing complex tasks not only easily, but also as cheaply
as possible. Since most of the time my modest Web hosting is more
hobby than business, I can't really afford to buy expensive -- or setup and manage -- complex software and hardware monitoring solutions.

I also, like many free software geeks, have a perverse, somewhat
mysterious need for uptimes to be as long as possible. Even if it
doesn't cost me money, I am bothered by unnecessary service
interruption. There is a certain virtue that comes in doing a job
excellently, even or especially if one is not doing it as one's
primary vocation.

I suppose for many free software users, uptime mania
is something of an occupational hazard. There is a kind of Zen-like
sysadmin virtue which comes from implementing a clever, efficient, and
inexpensive hack, but especially if that hack increases uptime and
service quality.

But sometimes things go wrong, whether from being too tired to type
the proper command at the proper time, or from the rare application or
system bug, or from causes entirely outside of your control. And it's not always possible or practical to sit down at the console of a
remotely colocated server in order to fix the problem.

For example, my main server is colocated 75 miles from where I live
in Dallas, Texas, and I can't easily drive 150 miles roundtrip to fix
a problem. In fact, I've never visited the NOC where my servers are
colocated, so I couldn't even find it without navigation help.

And I can't always rely on the techs who work at my colocation host;
sometimes they aren't available, or are too busy, or don't know
exactly what needs to be done. And some colocation facilities charge
additional fees for ad hoc system work like this.

One part of the answer to this common dilemma is to install a
daemon-monitoring daemon (hereafter, DMD) and to invest in a wireless
sysadmin device. But the real trick is doing that within the confines
of a limited budget. If you've got one or a few servers running Linux
remotely colocated, especially if they're halfway across the country,
where you got a great deal on bandwidth, then this two-part article
series is for you.

In this first part, I describe some of the available DMDs, and I
explain how to install and configure monit, the DMD I'm
using. In the next part of the series, I explain how to use a
Palm-enabled cell phone to do remote, wireless sysadmin work from
anywhere you can make or receive cell calls. Iíll also show you how to write a
very simple DMD-message routing mailbot, using a few lines of Python,
to make sure messages from your DMD get to you when and where you need
them. (Note: the system I've used to implement and test these
solutions is a Red Hat Linux box, but as far as I know, all of these
tools would work just as well on any Linux or BSD system.)

The Solution: A Daemon-Monitoring Daemon

The single most indispensable tool of remote Linux or BSD server
administration is undoubtedly SSH; actually, SSH is less a tool than
it is the tool which makes remote public server admin
practical in the first place. As long as I can get an SSH login to my
remote machine, I can usually fix most problems fairly
quickly.

Recently, though, when some security problems cropped up in
SSH itself, I had to spend a few hours one Friday afternoon upgrading
it. Which was not a big deal until I accidentally killed the wrong
process -- the SSH daemon -- effectively locking myself out of my remote
box.

Suffice to say, I convinced a sysadmin to drive to the NOC and
restart the SSH daemon on my box, after which I quickly changed the
root password. It was only then that I realized if I'd had a DMD
running, I would simply have had to wait a few minutes until it
restarted sshd for me.

Soon after that realization I started googling for "daemon
monitoring daemon"; I readily found several solutions, and I finally
chose to implement monit because it fit my situation the
best.

daemontools' supervise

The first tool I evaluated, supervise, is part of Dan
Bernstein's daemontools
package. Bernstein has earned an impressive reputation for writing high quality tools, including qmail;
daemontools is no exception.

Using supervise to monitor Apache, for example, is as
simple as running:

[root@chomsky]# supervise /service/apache

supervise changes to the /service/apache directory and
invokes /service/apache/run, which it will re-invoke if
/service/apache/run dies.

daemontools includes
svstat, which reads status information about services it is
monitoring, which it stores in a binary format. That's a nice feature
since, as we'll see, DMDs can fill up log files quickly. Finally, you
can use svscan in order to more easily direct
supervise to monitor a collection of services.

I had two, mostly non-technical problems with
daemontools. First, compared to some of the other DMDs I
found, it isn't very customizable. It does what it does well, but
that's about all that it does. I couldn't figure out, for example, how
to get supervise to send me email easily -- it's possible,
but more trouble than I wanted to take on -- if it had to restart
Apache, for example. Apache is normally very stable, and I want to
know if it's being restarted often by a DMD.

Second, and this is the
more serious problem, daemontools has very specific ideas
about how services should be managed, ideas which don't jibe well with
Red Hat's approach. I'm not entirely sure Red Hat's approach is
better, but I'm stuck with it for now. If I were building a new Linux
server from scratch, I would likely use Bernstein's
daemontools, especially for supervise and
multilog. As things stand, however, I had to look elsewhere
for a solution easier to integrate with my existing system.

mon

Jim Trocki's mon, a DMD
written in Perl, is very feature-rich and takes a slightly different
approach than the other DMDs I review here. It rigorously separates
service monitoring into programs which test conditions according to a
schedule, called monitors, and programs which invoke actions,
called alerts, depending on the outcome of a monitor.

One of the nice things about mon is that, despite being
written in Perl, you can write monitors and alerts in any programming
language you prefer, plop the script or binary into the write place,
and mon will do the rest. That's nice, especially if you
prefer Python to Perl, or Java to Python, or GNU Smalltalk to anything
else. It also allows for a more active user community to contribute
alerts and monitors to mon, which is also a very useful, free
software, Unix kind of thing.

A very long list of monitors
and alerts are
available for use with mon; so long, in fact, that it's very unlikely you'd have to write any monitors or alerts at all.

Another advantage is mon's very well-done Web interface, a
live demo of which you can play with at http://mon.lycos.com/. If only the
Web interface of more free software tools were half as well done as
mon's. Web interfaces, though, are less risky for use over
intranets than the public Web.

However, mon is too customizable, too extensible for my
use. I have rather modest expectations of a DMD, and while
mon could certainly fill the bill, its real sweet spot is
service monitoring on a large scale: dozens or hundreds of services
across dozens or hundreds of machines, including servers, routers,
network-accessible peripherals, and so on. I would not hesitate to use
mon in a large LAN or WAN context, especially given its
parallelization, asynchronous event, interservice dependency, and SNMP trap features.