Troubleshooting with the Fundamentals

My oldest son loves playing basketball. I like his current coach because he stresses the importance of fundamentals: Rebounding, ball handling, and good defense are much more important than flashy plays or showboating. I ran into a novel problem last week, and I thought it would make a good column because it illustrates how fundamentals are also important in messaging troubleshooting.

I've written before about my conversion from Palm OS to Windows Mobile ("DirectPush in the Real World," April 23, 2006). My Palm Treo 700w is the most valuable object I own because it allows me to store and process much more information than my poor brain can handle alone. Windows Mobile lets you have multiple messaging accounts on a single device; you can have one Exchange ActiveSync (EAS) account, plus multiple IMAP, POP, or Hotmail accounts. The account possibilities presented me with a conundrum: Should I use my home or work Exchange server as the primary EAS account? The home server won out thanks to Windows Mobile's ability to handle meeting requests sent to IMAP accounts, and to my need to keep a shared calendar with my family.

My work email comes to an IMAP account on the handheld device. This setup worked well until a couple of months ago, when I noticed a baffling problem: Sometimes when I sent a message, I received a nondelivery report (NDR) on the device, but that NDR didn't appear in OWA or on my desktop mail clients; it was being generated by the handheld device itself. The NDR was fairly unhelpful; it said that a recipient was invalid and that the message had been moved to the Drafts folder. Sometimes resending the failed message worked; other times it didn't.

At first, I figured this problem was a transient side effect of upgrading from the beta of Exchange Server 2007 to the release version, and that it would go away once we'd finished our upgrade. Messages didn't bounce all of the time, and the workaround was simple enough. Then Christmas came, and my discretionary troubleshooting time vanished. After upgrading our Client Access server earlier this month, the problem would still occasionally occur, and I finally decided to get to the bottom of things.

My first step to solving the problem was to figure out exactly where the NDR was coming from. When I used the Message Tracking Center in Exchange 2007, I didn't see any of the failed messages arriving at all. A quick check with the Get-MessageTrackingLog command in Exchange Management Shell revealed that other messages were flowing normally but that none of my NDR-producing messages sent via SMTP ever arrived. I then checked our Exchange Server 2003 SMTP bridgehead (which we've kept around so that we have a mixed environment for testing); the protocol logs that Microsoft IIS maintains showed SMTP delivery attempts from my handheld device, some successful, some not. That was my first clue; it proved that neither the Client Access server nor the Mailbox server was responsible for the NDR.

The second clue came when I used Telnet to connect to the bridgehead and send a message. I quickly found that messages to a single recipient always worked but messages with two or more recipients would always fail on the first try. The error was very instructive: The GRYNX Greylist filter that we'd installed back in November was rejecting messages sent to two or more recipients from my home network. I could see that all the failed attempts in the protocol log came from the Verizon Wireless IP address range, and suddenly the problem was obvious: The greylisting add-on had decided that my phone was a spammer. When I tried to send or reply to more than one addressee at a time, the filter blocked my message. On a conventional SMTP server, the message would be retransmitted later. However, Pocket Outlook didn't understand the temporary status code returned by the SMTP server, so it generated a device-side NDR.

The fix was simple; I merely had to add my sending account to the greylist software's whitelist to tell it that it should always accept mail from me. Poof! Problem solved.

What did I learn from this? First, if I'd remembered earlier that the software had been installed, I would have checked it. However, because I didn't install it myself, and because it wasn't causing any other problems, I had forgotten all about it. Lesson: Keep better track of configuration changes on the server. Second, I found that I like searching the message tracking logs about a million times more with the command line than with the message tracking GUI. This is obviously a personal preference, but the point is that you might find it easy to use Exchange Management Shell with its easy-to-understand cmdlets such as Get-MessageTrackingLog. Third, never underestimate the power of knowing how things work at a basic level. Impersonating the SMTP sender by using Telnet quickly unmasked the problem; thankfully, because the people who taught me emphasized fundamentals, I knew how to apply such a fundamental tool to troubleshoot the problem.

Chris J., the author of GRYNX Greylist, already wrote to tell me that this problem has been fixed in the current version, which is excellent support in my book. GRYNX Greylist has done a great job blocking spam, and we're going to continue to use it—now with a slightly healthier regard for its abilities!