devbox:~$ iptables -A OUTPUT -j DROP

I’ve recently worked on a customized emailing suite for a client that involves bulk email (shutter) and thought I’d do a write up on a few things that I thought were slick.

Originally we decided to use AWS SES but were quickly kicked off of the service because my client doesn’t have clean enough email lists (a story for another day on that).

The requirement from the email suite was pretty much the same as what you’d expect from SendGrid except the solution was a bit more catered toward my client. Dare I say I came up with an AWESOME way of dealing with creating templates? Trade secret.

Anyways – when the dust settled after the initial trials of the system and we were without a way to deliver bulk emails and track the SMTP/email bounces. After scouring for pre-canned solutions there wasn’t a whole lot to pick from. There were some convoluted modifications for postfix and other knick knacks that didn’t lend well to tracking the bounces effectively (or implementable in a sane amount of time).

Getting at the bounces…

At this point in the game, knowing what bounced can come from only one place: the maillog from postfix. Postfix is kind enough to record the Delivery Status Notification (‘DSN’) in an easily parsable way.

Pairing a bounce with database entries…

The application architecture called for very atomic analytics. So every email that’s sent is recorded in a database with a generated ID. This ID links click events and open reports on a per-email basis (run of the mill email tracking). To make the log entries sent from the application distinguishable, I changed the message-id header to the generated SHA1 ID – this lets me pick out which message ID’s are from the application, and which ones are from other sources in the log.

There’s one big problem though:

Postfix uses it’s own queue ID’s to track emails – this is that first 10 hex digits of the log entry. So we have to perform a lookup as such:

This is a problem because we can’t do two at the same time. The time when a DSN comes in is variable – this would lead to a LOT of grepping and file processing – one to get the queue ID. Another to find the DSN – if it’s been recorded yet.

We use Logstash where I work for log delivery from our web applications. In my experience with Logstash I learned that it is a tool with so much potential for what I was looking for. Progressive tailing of logs and so many built in inputs, outputs and filters for this kind of work it was a no brainer.

So I set up Logstash and three stupid-simple scripts to implement the plan.

Hopefully this is self explanatory to what the scripts take for input – and where they’re putting that input (see holding tank table above)

1

2

LogDSN.php%{QID}{%dsn}

LogOutbound%{QID}{%message-id}

Setting up logstash, logical flow:

Logstash has a handy notion of ‘tags’ – where you can have the system’s elements (input, output, filter) enact on data fragments tagged when they match a criteria.

Full config file (it’s up to you to look at the logstash docs to determine what directives like grep, kv and grok are doing)

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

input{

file{

format=>"plain"

path=>"/var/log/maillog"

type=>"maillog"

}

}

filter{

kv{

type=>"maillog"

trim=>"<>"

}

grep{

type=>"maillog"

match=>["status","bounced"]

add_tag=>["bounce"]

drop=>false

}

grep{

type=>"maillog"

match=>["message-id","[0-9a-f]{40}\@dom"]

add_tag=>["send"]

drop=>false

}

grok{

type=>"maillog"

pattern=>"%{SYSLOGBASE} (?<QID>[0-9A-F]{10}): %{GREEDYDATA:message}"

}

}

output{

exec{

tags=>"bounce"

command=>"php -f /path/to/LogDSN.php %{QID} %{dsn} &"

}

exec{

tags=>"send"

command=>"php -f /path/to/LogOutbound.php %{QID} %{message-id} &"

}

}

Then the third component – the result recorder runs on a cron schedule and simply queries records where the message ID and DSN are not null. The mere presence of the message ID indicates the email was sent from the application. the DSN means there’s something to enact on by the result recorder script.

* The way I implemented this would change depending on the scale – but for this particular environment, executing scripts instead of putting things into an AMQP channel or the likes was acceptable.

I installed log-stash on centos 6 but how to get maillog form postfix forwarder machine kindly send me the steps or file to configure or get maillog on logstash server. logstash succesffuly get syslog and messages logs but var/log/maillog not fetch, i installed postfix 2.10 on forwarder machine.