As we mentioned earlier, the more
things are automated, the more stable the server will be. In general,
there are three things that we want to ensure:

Apache is up and properly serving requests. Remember that it can be
running but unable to serve requests (for example, if there is a
stale lock and all processes are waiting to acquire it).

All the resources that mod_perl relies on are available and working.
This might include database engines, SMTP services, NIS or LDAP
services, etc.

The system is healthy. Make sure that there is no system resource
contention, such as a small amount of free RAM, a heavily swapping
system, or low disk space.

None of these categories has a higher priority than the others. A
system administrator's role includes the proper
functioning of the whole system. Even if the administrator is
responsible for just part of the system, she must still ensure that
her part does not cause problems for the system as a whole. If any of
the above categories is not monitored, the system is not safe.

A specific setup might certainly have additional concerns that are
not covered here, but it is most likely that they will fall into one
of the above categories.

Before we delve into details, we should mention that all automated
tools can be divided into two categories: tools that know how to
detect problems and notify the owner, and tools that not only detect
problems but also try to solve them, notifying the owner about both
the problems and the results of the attempt to solve them.

Automatic tools are generally called watchdogs.
They can alert the owner when there is a problem, just as a watchdog
will bark when something is wrong. They will also try to solve
problems themselves when the owner is not around, just as watchdogs
will bite thieves when their owners are asleep.

Although some tools can perform corrective actions when something
goes wrong without human intervention (e.g., during the night or on
weekends), for some problems it may be that only human intervention
can resolve the situation. In such cases, the tool should not attempt
to do anything at all. For example, if a hardware failure occurs, it
is almost certain that a human will have to intervene.