"For a moment, nothing happened. Then, after a second or so, nothing continued to happen." — HHGTTG

Month: September 2011

I keep my email in Maildir folders. It works well on the whole for every-day email, but it doesn’t work so well for large email archives (mainly because Unix systems don’t tend to cope well with folders containing a very large number of files). My system of archiving had been to simply copy messages older than a given number of days to a different Maildir folder that I use for my archives.

The problem was mainly backups. The backup tool I use (Tarsnap – which is brilliant by the way!) was taking ages to crawl over the archive folders. In addition, the folders were taking up a lot of space on disk and compressing many small files isn’t easy without making a tar file, or similar.

So I decided the best plan was to archive the messages to Mbox files. They’d compress well (in the end I just used a compressed ZFS filesystem), be backup friendly (because they’d rarely, if ever, change), and be quick to read from disk (it’s easier to read a large file than many little ones).

It can’t be hard, right? Isn’t an Mbox file approximately this?

cat Maildir/cur/* > mboxfile

Well, it turned out to be more effort than that. First you need to create the "From " separator line, which requires the sender and delivery date. These can be found by parsing the headers, but it’s surprising how many broken emails there were in my archives.

Next you need to decide what Mbox format to use. I thought there was only one! You can either escape "From " lines in the body, or you can add a Content-Length header, or do both.

After far more effort than I originally intended I came up with Maildirarc. It’s an extended version of my original shell script that just copied messages from one Maildir folder to another. I wrote it in Perl and decided to have a play with Git and Github for version control. You can see the results here: