monit – trust is good, control better

monit is a very cool system for keeping your linux servers working – highly recommended. With a few lines of configuration, you can have it check any aspect of your system and services and when problems occur, have it alert or take remedial actions (like restarting services, cleaning up log or temporary files etc).

For example, we had the problem that we are running clustered web apps (using Terracotta) in VMware VMs. The cluster nodes were being suspended regularly for backups and this caused them to be evicted from the cluster. A simple solution was to use monit to monitor the apps (via the same health check port we were using for the availability check for the HAProxy load balancer) and to restart the services if the health check fails (as happens after the VM is unsuspended after the backup).

Here’s the line from the monitrc file we use to monitor the health check port (9000 in this case):

The web app in this case is a java wicket app, running under jetty which also runs a health check on port 9000. A http query to this port causes the app to check its connection to its database and its cluster node status. If either fails, it returns a http error status. This health check is used by HAProxy (which takes failed nodes out of the load-balanced pool) and by monit (which restarts the services). This combination provides us with 100% uptime for these apps.