Mail Server Performance Monitoring with Mailgraph

Many seasoned administrators are more than familiar with how mail servers
mature quickly. As things scale and grow, maintainability becomes much more
difficult. Tailing a logfile is no longer an option to understand what your MTA is
doing when it processes hundreds or thousands of requests every minute. At
that point, automated monitoring tools — text-based or graphical —
become useful. Mailgraph is one such tool.

The Mailgraph web site
explains: "Mailgraph is a very simple mail statistics RRDtool front-end
for Postfix that produces daily, weekly,
monthly, and yearly graphs of received/sent and bounced/rejected mail."

In a nutshell, installing Mailgraph will allow us to see how our mail server
performs through neatly laid-out graphical and numerical representations of
mail traffic flowing through a particular mail server. If you've ever used a
similar tool that can display graphs, such as MRTG, you know that graphs often
speak volumes of invaluable information when trying to diagnose a problem
quickly. Graphs can portray information about the past, present, and sometimes
even the future.

You may have noticed that we've also mentioned RRDtool.
RRDtool is a piece of software created by Tobi Oetiker that has the ability
to store and display data that changes over time in a Round Robin Database
(RRD) that stays fixed in size over time. If you ever feel like Mailgraph isn't
enough in terms of monitoring, you can always start monitoring other server
variables as well, like your server-load average by using RRDtool as a
back-end.

Installing Mailgraph

We'll use Postfix as an MTA here. This
article assumes that you already have this running in one way or another, via a
source installation or via your local package manager (portage, apt, yum, etc.).
If you don't already have it installed, download the source from the Postfix
Web site.

It's worth taking note that the example setup already has amavisd-new coupled with SpamAssassin and ClamAV to tag spam and viruses. Mailgraph
will still function without a virus- and spam-checking facility, but it will not
graph spam or virus data if there's nothing to flag them.

Mailgraph requires a couple of other packages:

RRDtool.

The Perl modules Time::HiRes and File::Tail.
Download these from CPAN with:

Open a browser that can view images and visit your the cgi-bin
directory that now holds mailgraph.cgi. You should see your
Mailgraph images. If you did, congratulations! If not, double-check your work
carefully to see if you made a typo along the installation — that happens
quite often.

If you want mailgraph-init to start up on boot, you'll have to
add it into something like your rc.local file or copy and edit an
init script from your own distribution. Usually, starting
mailgraph-init from rc.local is easiest. Add a line
like this:

/etc/init.d/mailgraph-init start

If all goes well, you should start to see graphs that resemble Figure 1.

Figure 1. An example graph from Mailgraph.

Don't be alarmed if you see none of those blue or green lines. Good graphs
take time, so leave Mailgraph running for a few hours and try again later. If
your MTA has processed any mail, Mailgraph should have by then graphed a few
lines and reported some numbers at the bottom.

If you still can't seem to make Mailgraph work, or if you think you've found
a bug, you can subscribe to its mailing list and ask questions by sending an
email with the subject subscribe to mailgraph-request at
list.ee.ethz.ch. Be sure to provide specific information about your setup.

Installed, Working, Ready to Go: What am I Looking for?

Now that you've installed Mailgraph, it should become a valuable resource in
doing an analysis of your MTA. While it may not be evident immediately,
Mailgraph can help you test the effectiveness of your filtering setup.

For example, the graph in Figure 2 may indicate a few things.

Figure 2. A weekly graph of errors.

The reject rate for an MTA of this size is pretty high. The cause could
come from several factors -- perhaps our extensive set of
header_checks or from connecting hosts listed in the RBLs that we
use. If this rate continues to increase, it might be good to check the mail log for something fishy, particularly anomalies in mail delivery.

The bounces don't set off any alarms, but if we become suspicious, it might
be worth finding their causes.

The amount of viruses received is normal for this MTA and there are no
signs that there is a virus storm, but it might be a good idea to update our
ClamAV databases using freshclam just to be sure we're up-to-date.

There is very little spam this week. This isn't a cause for alarm, but
should turn on a light bulb. The spam count could be low due to the high reject
rate — we might be rejecting legitimate mail via our
header_checks. If we start hearing high rates of spam reports from
users or questions about missing mail, it's worth investigating and possibly
reconfiguring SpamAssassin to tag at a lower threshold, adding new rules, or
removing portions of our header_checks to allow more mail to
pass.

Mailgraph can also help you identify and see attacks as they're happening
and help make a post-assessment of any damage. For instance, if you're running
a dedicated mail server and start seeing load averages jump, looking at
Mailgraph can easily show what kind of mail your MTA has received, be it spam,
rejects, bounces, or viruses. You might see indications of an attack, which should prompt you to look at your mail log for suspicious hosts and then
possibly take action, perhaps firewalling off a host or using an RBL
specifically tailored to the attacking host(s). That way, your users don't have
to tell you that email is slow. You'll see it ahead of time and take action
early.

More Ideas

To monitor many hosts all at once, you can also create a single index page
that contains all the images from your various Mailgraph installations so that they're always a click away. To make the single index page, simply view the
source of mailgraph.cgi and "steal" the image links. It can end up
looking as simple as Figure 3.

Figure 3. A unified server status page.

Showing Mailgraph to others like the hard-balled people in management can
help sway decisions as well. If your mail server starts to suffer under a heavy
load of mail and management is wondering where their emails are, show them a
few graphs with a few toppings of sar output. The graphs and
numbers should be more than enough for anybody to understand that your systems
are overloaded and need attention. If not, show how mail traffic looked before
and how it looks now.

Please remember that adding Mailgraph to your list of weapons is no
substitute for occasional and true vigilance. Mailgraph is only there to
suggest that you check things out.