imapd Storms

[posted 2000/04/26]

We are in the midst of an email crisis stemming from our merging our
32,000+ user alumni mail operation with our main 4,000+ current user mail
operation, also from providing new mail services such as Web-based email
access.

I'll spare you most of the details, except these: We are running IMP, the
"Imap webMail Program", put out by the Horde Project (www.horde.org). A
bug in our version of IMP (2.0) has the imapd, under occasional and still
mysterious circumstances, spawning instances of itself every second or
so. The good news is that this is a known bug that will be fixed in the
next version of IMP (still in beta). The bad news is that, for a handful
of users, we are seeing occasional "imapd storms" with per-user imapd counts
reaching into the dozens, hundreds, and sometimes even thousands. The
highest recorded count so far is 3816! =:0

Not only do these imapd storms risk losing the user's mail file, they
also imperil the entire system, as you can imagine.

What to do? Ideally, we/they fix the underlying software. Or come up with
some configurational tweaks to allay the problem. But discovering/implementing
these take time. In the meantime, PIKT to the rescue!

The following is a new PIKT script we have put into operation on our mail
server to deal with these problems:

This is a work-in-progress. I'd like to render some common elements as
macros, and perhaps refer to a *.obj file matching proc names to instance
counts. Still, this bandaid is working well enough in the short run.

We still don't have an understanding of this problem, much less a fix, but
at least we are not losing any more user email, and our mail server is
coping. (Most users are unaware of these difficulties.)

Ultimately this is a configurational issue, in a sense, in that properly
written software and/or properly tweaked configuration files (perhaps extending
to base system files, not just the configuration files for IMP, imapd, etc.),
would fix the problem. On the other hand, even with the best of software,
and "correct" configurations, zaniness might erupt from time to time, and
for the life of you, you can't just figure it all out. This is why, it
seems to me, you need a good system monitoring tool, with auto-corrective
capabilities, beyond tools just to help you with your configuration management.
Of course, PIKT does both!

We had a working, stable mail setup until the alumni merge several weeks
ago. Words to the wise: If it ain't broke, don't fix it!

But if it is broke, mend it in the end, but consider applying PIKT bandaids
(tourniquets?) while the battle still rages.