"It Just Works" is more than just a slogan, it's a way of life!

Main menu

Post navigation

In parts 1 and 2 of this series, I’ve explored the current best practices for authenticating outbound email, validating inbound email, and my own system configurations for such.

When not busy with my day job, I also serve on the board of our neighborhood youth basketball league as registrar and co-webmaster. As part of these roles, I maintain the email infrastructure, and send most of the announcement emails.

I had made a point of “warming up” the IP address for this server. I’ve been using this same IP for several months, and it has a returnpath.com Sender Score of 99, so I thought I’d be in the clear.

We have about 1500 parent email addresses on our announcement mailing list, with over 160 of them going to a single domain – austin.rr.com. No surprise there – Roadrunner is a very popular ISP. The problem is this: Roadrunner’s mail servers don’t appreciate when my mail server sends an email, via this mailing list, to their inbound MX mail server. Even after mailman splits the message up so there are only 10 recepients per message. The first few messages get through, the rest get put on hold (SMTP 4.x.x try again later), allowing only a trickle of messages per hour from my one IP address.

During the Austin Snowpocalypse last week, we needed to get an announcement out to our parents, that, because schools were closed, and because we rent all our court space from the area schools, our practices and games had to be cancelled for the night. I sent that note around noon. It took until 6pm before all the @austin.rr.com emails were allowed through – just in time to get notice of a cancelled 6pm event. Note – my message came from a valid SPF mail server, had a valid DKIM key attached, and the DMARC policy is “none”, so it wasn’t blocked at that level. It was blocked because my mail server’s IP address isn’t allowed to send more than a few messages to Roadrunner subscribers each hour.

Roadrunner does provide a way for you to request relaxed rate limits for your IP. I followed their process, which uses Return Path’s Feedback Loop Management service, but my request was denied, no explaination given. Perhaps they know it’s a cloud service IP, which in theory could be given to another customer at any moment. I’ll file a request with my ISP to see if they’ll sign up with ReturnPath to be responsible for their netblocks on behalf of their customers. Not sure how well that’ll go over – it could make a lot of work for the ISP mail technicians.

One other alternative is to use an outbound mail service such as Amazon Simple Email Service, Mailchimp or SendGrid, in order to get my mails out to our player’s families in a timely manner. Mailchimp appears to have the disadvantage of needing to migrate all my mailing lists to them, instead of my existing GNU Mailman setup. SendGrid has better pure SMTP integration, and with some sendmail smarttable hacking, I could probably make that work. All three involve some increased cost to us, in the months I send a lot of announcements.

Do you send bulk mail from a cloud service? How do you ensure your mails get through? Leave your comments below.

In part 1 of this series, I relayed a bit of my story about my use of SPF, DKIM, and DMARC to try to reduce the spam being sent as if from my personal domain, while increasing the odds that legitimate mail from my domain gets through.

In this part, I describe how these are actually implemented in my case.

First, let me describe my email setup. I have one cloud-hosted server, smtp.domsch.com, through which all authentic *@domsch.com email is sent. Senders may be either local to this server (such as postmaster@ which sends the DMARC reports to other mail servers), or may be family members who use a hosted email service (as it happens, all use GMail) as their Mail User Agent. Users make an authenticated connection to smtp.domsch.com, which then DKIM-signs the messages and sends them on toward their destination MX server. These users may also be subscribed to various mailing lists which would break (fail to get their legitimate message through to the expected receivers) if SPF policy were anything except softfail.

Outbound, user-authenticated mail from *@domsch.com should be treated differently than inbound mail. Outbound mail requires only a DKIM milter to sign each message. Messages are signed with a DKIM key, published in my DNS:default._domainkey.domsch.com. 7200 IN TXT "v=DKIM1\; k=rsa\; s=email\; p=(some nice long hex string)"

I publish a DMARC DNS record so I can get reports back from DMARC-compliant servers._dmarc.domsch.com. 7200 IN TXT "v=DMARC1\; p=none\;
rua=mailto:dmarc-aggregate@domsch.com\; ruf=mailto:dmarc-forensics@domsch.com\;
adkim=r\; aspf=r\; rf=afrf "

Inbound mail to *@domsch.com should pass each message through an SPF milter which adds a Received-SPF header, a DKIM milter to check the validity of a DKIM-signed message which adds an Authentication-Results header, and the DMARC milter which decides what to do based on the results of these other two headers, and sends results to DMARC senders.

smtp.domsch.com runs CentOS 6.x, sendmail, and a variety of milters. On outbound mail, it runs opendkim. On inbound mail, it runs smf-spf, opendkim, and opendmarc, before sending it on to its final destination. My sendmail.mc file is configured as such to allow the different milters to run depending on direction – outbound or inbound:

Why do the milters listen on a local TCP socket, instead of a UNIX domain socket? Simply, they don’t yet have SELinux policies in place that let them use a domain socket. Once these packages are properly reviewed and included in Fedora/EPEL, we will adjust the listening port to be a domain socket.

Of these milters, opendkim and opendmarc seem to be properly maintained still. smf-spf, for its whole ~1000 lines of code, has been largely untouched since 2005, and its maintainer seems to have
completely fallen off the Internet. All my attempts to find a valid address for him have failed. There are a variety of other SPF filters, the most popular of which is python-postfix-policyd-spf – which as the name implies is postfix-specific, and as I noted, I’m not running postfix. Call me lazy, but sendmail works well enough for me at present.

These milters are currently under review (smf-spf, libspf2, opendmarc) in Fedora and will eventually land in the EPEL repositories as well. opendkim is already in EPEL.

If you are using SPF, DKIM, and DMARC, what does your configuration look like? Please leave a comment below.

We all dislike email spam. It clogs up our inboxes, and causes good
engineers to spend way too much time creatively blocking or filtering
it out, while spammers creatively work to get around the blocks. In
my personal life, the spammers are winning. (My employer, Dell, makes
several security and spam-fighting products. I’m not using them for
my personal domains, so this series is not related to Dell products in
any way.)

I recently came across DMARC, the Domain-based Message Authentication,
Reporting & Conformance specification. One feature of DMARC is that it
allows mail receivers, after processing given piece of mail, to inform
an address at the sending domain of that mail’s disposition: passed,
quarantined, or rejected. This is the first such feedback method I’ve
come across, and it seems to be gaining traction. Furthermore,
services such as dmarcian.com have popped up to act as DMARC report
receivers, which then display your aggregate results in a clear
manner.

A DMARC-compliant outbound mail server provides several useful bits of
information. 1) The domain publishes a valid Sender Policy Framework
(SPF) record. 2) The domain signs mail using Domain Keys Identified
Mail (DKIM). These are best practices now, in place by millions of
domains. In addition, the domain publishes its DMARC policy
recommendation, what an inbound mail server should do if a message
purporting to be from the domain fails both SPF and DKIM checks. The
policies today include “none” – do nothing special, “quarantine” –
treat the message as suspect, perhaps applying additional filtering or
sending to a spam folder, and “reject” – reject the message
immediately, sending a bounce back to the sender.

A DMARC-compliant inbound mail server validates each incoming message
against two things: compliance with the Sender Policy Framework (SPF)
and checks the DKIM signature. The server then follows the policy
suggested by the sending domain (none, quarantine, or reject), and
furthermore, reports back the results of its actions daily to the
purported sending domain.

I’ve been publishing SPF records for my personal and community
organization domains for several years, in hopes this would cut down
on spammers pretending to be from my domains. I recently added DKIM
signing, the next step in the process. With these two in place,
publishing a DMARC policy is very straightforward. So I did this,
publishing a “none” policy – just send me reports. And within a few
days, I started getting reports back, which I sent to dmarcian.com for
analysis.

What did I find?

On a usual day, my personal domain, used by myself and family members,
sends maybe a hundred total emails, as reported by DMARC-reporting
inbound servers. My community org domains may send 1000-2000 emails a
day, particularly if we have announcements to everyone on our lists.
That seems about right.

In addition, spammers, mostly in Russia and other parts of Asia, are
sending upwards of 20,000-40,000+ spam messages pretending to come from my
personal domain, again as reported by DMARC-reporting inbound
servers. Hotmail’s servers kindly are sending me reports for each
failed message they process thinking they were from me – a steady
stream of ~3600/day. No other DMARC servers have sent me such forensic
data yet.

Spam source by country for the last week

For several days, I experimented with a DMARC policy of “quarantine”,
with various small percentages from 10 to 50 percent. And sure
enough, dmarcian reports that the threat/spam mails were in fact
quarantined. It was really cool to wake up in the morning, check the
overnight results, and see the threat/spam graphs show half of the
messages being quarantined. It’s working!

However, dmarcian also reported that some of my legitimate emails,
originating from my servers and being DKIM-signed, were also getting
quarantined. What? That wasn’t what I hoped for.

It turns out that authentic messages were in fact being forwarded –
some by mailing lists, some by individuals setting up forwarding from
one inbound mail address to another. Neither of which I can do
anything about.

This isn’t a new problem – it’s the Achilles heel of SPF, which DMARC then inherits. Forwarding email through a mailing list typically makes subtle yet valid changes while keeping the From: line the same.

The Subject: line may get a [listname] prepended to it. The body may
get a “click here to unsubscribe” footer added to it. These
invalidate the original DKIM signature. The list may strip out the
original DKIM signature. And of course, it remails the message,
outbound using its own server name and IP, which causes it to then
fail SPF tests.

Sure, there are suggested solutions, like getting everyone to use
Sender Rewriting Scheme (SRS) when remailing, and fixing Mailman and
every other mailing list manager. Wake me when all the world’s email
servers have added that, I will have been dead a very very long time.

So, I switched back to policy “none”, and get the reports, aggravated
that there’s nothing I can directly do to protect the good name of my
domains. It’s hard both knowing the size of the problem, and knowing
we have no technological method of solving it today. Food for
thought.

In part 2 of this series, I will describe my system setup for using
the above techniques.

Do you use SPF? Do you use DKIM? Do you publish a DMARC policy? If so, what has your experience been? Leave comments below.

While working on Dell’s acquisition of Enstratius, one of the highlights for me was the work George Reese and team have done on the open source (Apache license) cloud abstraction layer – Dasein Cloud. I’m pleased Enstratius joined Dell, and that the work on building Dasein, and making Dasein available for other uses, has only accelerated.

Please see George’s blog post on his views of Dasein’s progress in just the last few months, and if you’re at OSCON, stop by the Dell booth or the Dasein session and talk to George.

Has anyone written free/open source software to use the publicdata.com query API? I’ve got several dozen coaches we need to do background checks on, and we use publicdata.com. Turns out our folks have been doing this manually for years. Seems like something that could be easily automated, but haven’t found any software, open or otherwise, to do so.

MirrorManager’s primary aim is to make sure end users get directed to the “best” mirror for them. “Best” is defined in terms of network scopes, based on the concept that a mirror that is network-wise “close” to you is going to provide you a better download experience than a mirror that is “far” from you.

In a pure DNS-based round robin mirror system, you would expect all requests to be sent to a “global” mirror, with no preference for where you are on the network. In a country-based DNS round robin system, perhaps where the user has specified what country they are in, or perhaps it was automatically determined, you’d expect most hits in countries where you know you have mirrors.

MirrorManager’s scopes include clients and mirrors on the the same network blocks, Autonomous System Numbers, jointly on Internet2 or its related regional high speed research and education networks in your same country, then falling back to GeoIP to find mirrors in the same country, and same continent. In only the rarest of cases does the GeoIP lookup fail, we have no idea where you are, and you get sent to some random mirror somewhere.

But, how well does this work in practice? MM 1.4 added logging, so we can create statistics on how often we get a hit for each scope. Raw statistics:

Scope

Percentage

On-Network Percentage

netblocks

16.10%

16.10%

Autonomous System

5.61%

21.71%

Internet2

8.95%

30.66%

geoip country

57.50%

88.16%

geoip continent

10.34%

98.51%

Global (any mirror)

1.38%

99.88%

In the case of MirrorManager, we take it three steps further than pure DNS round robin or GeoIP lookups. By using Internet2 routing tables, ASN routing tables, and letting mirror admins specify their Peer ASNs and their own netblocks, we are able to, in nearly 22% of all requests, keep the client traffic completely local to the organization or upstream ISP, and when adding in Internet2 lookups, a whopping 30% of client traffic never hits the commodity Internet at all. In 88% of all cases, you’re sent to a mirror within your own country – never having to deal with congested inter-country links.

After nearly 3 years in on-again/off-again development, MirrorManager 1.4 is now live in the Fedora Infrastructure, happily serving mirrorlists to yum, and directing Fedora users to their favorite ISOs – just in time for the Fedora 19 freeze.

Kudos go out to Kevin Fenzi, Seth Vidal, Stephen Smoogen, Toshio Kuratomi, Pierre-Yves Chivon, Patrick Uiterwijk, Adrian Reber, and Johan Cwiklinski for their assistance in making this happen. Special thanks to Seth for moving the mirrorlist-serving processes to their own servers where they can’t harm other FI applications, and to Smooge, Kevin and Patrick, who gave up a lot of their Father’s Day weekend (both days and nights) to help find and fix latent bugs uncovered in production.

What does this bring the average Fedora user? Not a lot… More stability – fewer failures with yum retrieving the mirror lists, not that there were many, but it was nonzero. A list of public mirrors where the versions are sorted in numerical order.

What does this bring to a Fedora mirror administrator? A few new tricks:

Mirror admins have been able to specify their own Autonomous System Number for several years. Clients on the same AS get directed to that mirror. MM 1.4 adds the ability for mirror admins to request additional “peer ASNs” – particularly helpful for mirrors located at a peering point (say, Hawaii), where listing lots of netblocks instead is unwieldy. As this has the potential to be slightly dangerous (no, you can’t request ALL ASNs be sent your way), ask a Fedora sysadmin if you want to use this new feature – we can help you.

Multiple mirrors claiming the same netblock, or overlapping netblocks, were returned to clients in random order. Now they will be returned in ascending netblock size order. This lets an organization that has a private mirror, and their upstream ISP, both have a mirror, and most requests will be sent to the private mirror first, falling back to the ISP’s mirror. This should save some bandwidth for the organization.

If you provide rsync URLs, You’ll see reduced load from the MM crawler as it will now use rsync to retrieve your content listing, rather than a ton of HTTP or FTP requests.

reduced memory usage in the mirrorlist servers. Especially with as bad as python is at memory management on x86_64 (e.g. reading in a 12MB pickle file blows out memory usage from 4MB to 120MB), this is critical. This directly impacts the number of simultaneous users that can be served, the response latency, and the CPU overhead too – it’s a win-win-win-win.

An improved admin interface – getting rid of hand-coded pages that looked like they could have been served by BBS software on my Commodore 64 – for something modern, more usable, and less error prone.

Code specifically intended for use by Debian/Ubuntu and CentOS communities, should they decide to use MM in the future.

A new method to upgrade database schemas – saner than SQLObject’s method. This should make me less scared to make schema changes in the future to support new features. (yes, we’re still using SQLObject – if it’s not completely broken, don’t fix it…)

Map generation moved to a separate subpackage, to avoid the dependency on 165MB of python-basemap and python-basemap-data packages on all servers.

MM 1.4 is a good step forward, and hopefully I’ve laid the groundwork to make it easier to improve in the future. I’m excited that more of the Fedora Infrastructure team has learned (the hard way) the internals of MM, so I’ll have additional help going forward too.

I have the pleasure of moderating the Fedora Project Board Town Hall today, 1900 UTC, having served on the board for five years previously. Held on IRC, these Town Halls give project members a chance to ask questions directly of the five Board candidates, so that you can make a more informed decision when casting your vote. I hope you can join us.

Two weeks ago I once again had the opportunity to attend the Fedora User and Developer Conference, this time in Lawrence, KS. My primary purpose in going was to work with the Fedora Infrastructure team, and develop a plan for MirrorManager maintenance going forward, and learn about some of the faster-paced projects that Fedora is driving.

MirrorManager began as a labor of love immediately after the Fedora 6 launch, when our collection of mirrors was both significantly smaller and less well wrangled, leading to unacceptable download times for the release, and impacts to Fedora and Red Hat networks and our few functional mirrors that we swore never to suffer or inflict again. Fedora 18 launch, 6 years later, was just as downloaded as before, but with nearly 300 public mirrors and hundreds of private mirrors, the release was nary a blip on the bandwidth charts, as “many mirrors make for light traffic”. To that end, MirrorManager continues to do its job well.

However, over the past 2 years, with changes in my job and outside responsibilities, I haven’t had as much time to devote to MirrorManager maintenance as I would have liked. The MirrorManager 1.4 (future) branch has languished, with an occasional late-night prod, but no significant effort. This has prevented MirrorManager from being more widely adopted by other non-Fedora distributions. The list of hotfixes sitting in Fedora Infrastructure’s tree was getting untenable. And I hadn’t really taken advantage of numerous offers of help from potential new maintainers.

FUDCon gave me the opportunity to sit down with the Infrastructure team, including Kevin, Seth, Toshio, Pierre, Stephen, Ricky, Ian and now Ralph, to think through our goals for this year, specifically with MM. Here’s what we came up with.

I need to get MM 1.4 “finished” and into production. This falls squarely on my shoulders, so I spent time both at FUDCon, and since, moving in that direction. The backlog of hotfixes needed to get into the 1.4 branch. The schema upgrade from 1.3 to 1.4 needed testing on a production database (Postgres) not just my local database (mysql) – that revealed additional work to be done. Thanks to Toshio for getting me going on the staging environment again. Now it’s just down to bug fixes.

I need not to be the single point of knowledge about how the system works. To that end, I talked through the MM architecture, which components did what, and how they interacted. Hopefully the whole FI team has a better understanding of how it all fits together.

I need to be more accepting of offers of assistance. Stephen, Toshio, and Pierre have all offered, and I’m saying “yes”. Stephen and I sat down, figured out a capability he wanted to see (better logging for mirrorlist requests to more easily root cause failure reports), he wrote the patch, and I accepted it. +1 to the AUTHORS list.

Ralph has been hard at work on fedmsg, the Fedora Infrastructure Message Bus. This is starting to be really cool, and I hope to see it used to replace a lot of the cronjob-based backend work, and cronjob-based rsyncs that all our mirrors do. One step closer to a “push mirror” system. Wouldn’t it be cool if Tier 2 mirrors listened on the message bus for their Tier 1 mirror to report “I have new content in this directory tree, now is a good time to come get it!” , and started their syncs, rather than the “we sync 2-6 times a day whenever we feel like it” that mirrors use today ? I think so.

Now, to get off (or really, on) the couch and make it happen!

A few other cool things I saw at FUDCon I wanted to share (snagged mostly from my twitter stream):

As posted to the s3tools-general mailing list, s3tools maintainer Michal Ludvig is looking for new maintainers to step up to continue the care and feeding of the s3tools / s3cmd application. s3cmd is widely used, on both Linux and Windows, to publish and maintain content in the Amazon Web Services S3 storage system and CloudFront content distribution network.

I use s3cmd for two primary purposes:

as Fedora Mirror Wrangler, I use it within Fedora Infrastructure to maintain mirrors within S3 in each region for the benefit of EC2 users running Fedora or using the EPEL repository on top of RHEL or a derivative. Fedora has mirrors in us-east-1, us-west-1 and -2, and eu-west-1 right now, and may add the other regions over time.

for my own personal web site, I offload storage of static historical pictures and movies so that they are served from economical storage and not consuming space on my primary web server.

I congratulate Michal for recognizing when he no longer has the time to commit to regular maintenance of such an important project, and to begin looking for contributors who can carry out that responsibility more effectively. While I’ve submitted a few patches in support of the Fedora Infrastructure mirror needs, I know that I don’t have the time to take on that added responsibility right now either.

If you use s3cmd, or have contributed to s3cmd, and feel you could make the time commitment to be the next maintainer, you’ll find an active contributor base and dedicated user base to help you move the project forward.