Field notes and occasional musings by Peter on Stuff that happens, from a free software perspective, mainly OpenBSD, FreeBSD.

Saturday, April 23, 2016

Does Your Email Provider Know What A "Joejob" Is?

Anecdotal evidence seems to indicate that Google and possibly other mail service providers are either quite ignorant of history when it comes to email and spam, or are applying unsavoury tactics to capture market dominance.

The first inklings that Google had reservations about delivering mail coming from my bsdly.net domain came earlier this year, when I was contacted by friends who have left their email service in the hands of Google, and it turned out that my replies to their messages did not reach their recipients, even when my logs showed that the Google mail servers had accepted the messages for delivery.

Contacting Google about matters like these means you first need to navigate some web forums. In this particular case (I won't give a direct reference, but a search on the likely keywords will likely turn up the relevant exchange), the denizens of that web forum appeared to be more interested in demonstrating their BOFHishness than actually providing input on debugging and resolving an apparent misconfiguration that was making valid mail disappear without a trace after it had entered Google's systems.

The forum is primarily intended as a support channel for people who host their mail at Google (this becomes very clear when you try out some of the web accessible tools to check domains not hosted by Google), so the only practial result was that I finally set up DKIM signing for outgoing mail from the domain, in addition to the SPF records that were already in place. I'm in fact less than fond of either of these SMTP addons, but there were anyway other channels for contact with my friends, and I let the matter rest there for a while.

If you've read earlier instalments in this column, you will know that I've operated bsdly.net with an email service since 2004 and a handful of other domains from some years before the bsdly.net domain was set up, sharing to varying extents the same infrastructure. One feature of the bsdly.net and associated domains setup is that in 2006, we started publishing a list of known bad addresses in our domains, that we used as spamtrap addresses as well as publising the blacklist that the greytrapping generates.

Over the years the list of spamtrap addresses -- harvested almost exclusively from records in our logs and greylists of apparent bounces of messages sent with forged From: addresses in our domains - has grown to a total of 29757 spamtraps, a full 7387 in the bsdly.net domain alone. At the time I'm writing this 31162 hosts have attempted to deliver mail to one of those spamtrap addresses during the last 24 hours. The exact numbers will likely change by the time you read this -- blacklisted addresses expire 24 hours after last contact, and new spamtrap addresses generally turn up a few more each week. With some simple scriptery, we pick them out of logs and greylists as they appear, and sometimes entire days pass without new candidates appearing. For a more general overview of how I run the blacklist, see this post from 2013.

In addition to the spamtrap addresses, the bsdly.net domain has some valid addresses including my own, and I've set up a few addresses for specific purposes (actually aliases), mainly set up so I can filter them into relevant mailboxes at the receiving end. Despite all our efforts to stop spam, occasionally spam is delivered to those aliases too (see eg the ethics of running the traplist page for some amusing examples).

Then this morning a piece of possibly well intended but actually quite clearly unwanted commercial email turned up, addressed to one of those aliases. For no actually good reason, I decided to send an answer to the message, telling them that whoever sold them the address list they were using were ripping them off.

That message bounced, and it turns out that the domain was hosted at Google.

Reading that bounce message is quite interesting, because if you read the page they link to, it looks very much like whoever runs Google Mail doesn't know what a joejob is.

The page, which again is intended mainly for Google's own customers, specifies that you should set up SPF and DKIM for domains. But looking at the headers, the message they reject passes both those criteria:

Then for reasons known only to themselves, or most likely due to the weight they assign to some unknown data source, they reject the message anyway.

We do not know what that data source is. But with more than seven thousand bogus addresses that have generated bounces we've registered it's likely that the number of invalid bsdly.netFrom: addresses Google's systems has seen is far larger than the number of valid ones. The actual number of bogus addresses is likely higher, though: in the early days the collection process had enough manual steps that we're bound to have missed some. Valid bsdly.net addresses that do not eventually resolve to a mailbox I read are rare if not entirely non-existent. But the 'bulk mail' classification is bizarre if you even consider checking Received: headers.

The reason Google's systems most likely has seen more bogus bsdly.netFrom: addresses than valid ones is that by historical accident faking sender email addresses in SMTP dialogues is trivial.

Anecdotal evidence indicates that if a domain exists it will sooner or later be used in the from: field of some spam campaign where the messages originate somewhere else completely, and for that very reason the SPF and DKIM mechanisms were specified. I find both mechanisms slightly painful and inelegant, but used in their proper context, they do have their uses.

For the domains I've administered, we started seeing log entries, and in the cases where the addresses were actually deliverable, actual bounce messages for messages that definitely did not originate at our site and never went through our infrastructure a long time before bsdly.net was created. We didn't even think about recording those addresses until a practical use for them suddenly appeared with the greytrapping feature in OpenBSD 3.3 in 2003.

A little while after upgrading the relevant systems to OpenBSD 3.3, we had a functional greytrapping system going, at some point before the 2007 blog post I started publishing the generated blacklist. The rest is, well, what got us to where we are today.

From the data we see here, mail sent with faked sender addresses happens continuously and most likely to all domains, sooner or later. Joejobs that actually hit deliverable addresses happen too. Raw data from a campaign in late 2014 that used my main address as the purported sender is preserved here, collected with a mind to writing an article about the incident and comparing to a similar batch from 2008. That article could still be written at some point, and in the meantime the messages and specifically their headers are worth looking into if you're a bit like me. (That is, if you get some enjoyment out of such things as discovering the mindbogglingly bizarre and plain wrong mail configurations some people have apparently chosen to live with. And unless your base64 sightreading skills are far better than mine, some of the messages may require a bit of massaging with MIME tools to extract text you can actually read.)

Anyone who runs a mail service and bothers even occasionally to read mail server logs will know that joejobs and spam campaigns with fake and undeliverable return addresses happen all the time. If Google's mail admins are not aware of that fact, well, I'll stop right there and refuse to believe that they can be that incompentent.

The question then becomes, why are they doing this? Are they giving other independent operators the same treatment? If this is part of some kind of intimidation campaign (think "sign up for our service and we'll get your mail delivered, but if you don't, delivering to domains that we host becomes your problem"). I would think a campaign of intimidation would be a less than useful strategy when there are alread antitrust probes underway, these things can change direction as discoveries dictate.

Normally I would put oddities like the ones I saw in this case down to a silly misconfiguration, some combination of incompetence and arrogance and, quite possibly, some out of control automation thrown in. But here we are seeing clearly wrong behavior from a company that prides itself in hiring only the smartest people they can find. That doesn't totally rule out incompetence or plain bad luck, but it makes for a very strange episode. (And lest we forget, here is some data on a previous episode involving a large US corporation, spam and general silliness. (Now with the external link fixed via the wayback machine.))

One other interesting question is whether other operators, big and small behave in any similar ways. If you have seen phenomena like this involving Google or other operators, I would like to hear from you by email (easily found in this article) or in comments.

Update 2016-05-03: With Google silent on all aspects of this story, it is not possible pinpoint whether the public whining or something else that made a difference, but today a message aimed at one of those troublesome domains made it through to its intended recipient.

Much like the mysteriously disappeared messages, my logs show a "Message accepted for delivery" response from the Google servers, but unlike the earlier messages, this actually appeared in the intended inbox. In the meantime one of one of the useful bits of feedback I got on the article was that my IPv6 reverse DNS was in fact lacking. Today I fixed that, or at least made a plausible attempt to (you're very welcome to check with tools available to you). And for possibly unrelated reasons, my test message made it through all the way to the intended recipient's mailbox.

[Opinions offered here are my own and may or may not reflect the views of my employer.]

Update 2016-11-21: Since this piece was originally written, we've seen several episodes of Gmail.com or other Google domains disappearing or bouncing mail from bsdly.net, and at some point I got around to setting up proper DMARC records as well.

In a separate development, I also set up the script that generates the blacklist for export to send a copy of its output to my gmail.com addresses. This has lead to a few fascinating bounces, all of them archived here in order to preserve a record of incidents and developments

It should be no surprise that even the SPF-DKIM-DMARC trinity is not actually enough to avoid bounces and disappearances, driving Google to come up with a separate DNS TXT record of their own, the google-site-verification record, which contains a key they will generate for your domain if you manage to find the correct parts of their website.

Update 2017-02-12- 2017-02-13: At semi-random intervals since this piece was originally written, Gmail has had a number of short periods of bouncing my hourly spamtrap reports sent to myself at bsdly.net with a Cc: to my gmail account.

At most times these episodes last for a couple of hours only, possibly when a human intervenes to correct the situation. However, this weekend we have seen more of this nonsense than usual: Gmail started bouncing my hourly reports on Saturday evening CET, with the first bounce for the 2017-02-11 21:10 report, keeping up the bounces through the 2017-02-12 05:10 report.

Then after a brief respite the bounces resumed with the 2017-02-12 08:10 report and have persisted, bouncing every hourly message (except the 2017-02-13 03:10, 04:10 and 13.10 messages) and with non-delivery, with the last bounce received now for the 2017-02-13 21:10 report. I archive all bounces as they arrive mod whatever time I have on my hands in the googlefail archive.
I will be giving a PF tutorial at BSDCan 2016, and I welcome your questions now that I'm revising the material for that session. See this blog post for some ideas (note that I'm only giving the PF tutorial this time around).

7 comments:

I, too, abhor DKIM and SPF; they are roadblocks applied to the SMTP process of dubious value. They really do seem to be a driver for all kinds of "sign up for our service because we'll handle that all for you," which extra-rankles me.

In short, DKIM and SPF *cause* deliverability problems by design, and every time people tell me "Oh just switch to GApps" I want to thwack 'em with a Cluex4.

I don't see a DKIM record in your txt for 'bsdly.net' which would be by if google bounced it. If they're seeing a high bad sender rate they'll reject all valid messages until both SPF and DKIM are setup properly. Digging your DNS zone I don't see any DKIM at all. Your SPF is setup but you appear to lack all DKIM setup for 'bsdly.net'.

We had this curious issue as well. However we were forwarding email from an office365 domain to a google account. Any email from yahoo to the 365 domian then forwarded to google would vanish. Google would accept the message then delete it. This is also was the default action of postini. It will accept the message determine if its spam, it just deletes the message.

The only way that we were able to solve the issue was to remove the forwarding.

Office365 does not support DKIM as exchange does not support it.

I also worked at a company that used google apps i hated it. Very little control over anything. They bought postini to control spam and integrated it into their systems. However if postini marks a message as spam it will just delete it. The last time (a few years ago) I looked in to it i could not find an option to not delete the messages.

In 2012 I spent several hours helping a mom and pop web consulting company with the exact same issue. It never was resolved. We also went through the whole SPF+DKIM+DMARC story. We checked reverse DNS. Everything was in order. There was no effective Google support that we could get through to.

Lately I've also noticed that sometimes an email will get through, but it comes in hours or days late. Really useless if you are in a discussion thread, the discussion moves on.

Note: Comments are moderated. On-topic messages will be liberated from the holding queue at semi-random (hopefully short) intervals.

I invite comment on all aspects of the material I publish and I read all submitted comments. I occasionally respond in comments, but please do not assume that your comment will compel me to produce a public or immediate response.

Please note that comments consisting of only a single word or only a URL with no indication why that link is useful in the context will be immediately recycled so those poor electrons get another shot at a meaningful existence.

If your suggestions are useful enough to make me write on a specific topic, I will do my best to give credit where credit is due.

About Me

Puffyist, daemon charmer, penguin wrangler. Wrote The Book of PF (3rd ed out now, see http://www.nostarch.com/pf3), rants on sanity in IT (lack of) at http://bsdly.blogspot.com/. Please read http://www.bsdly.net/~peter/rentageek.html before contacting.