This is an article introducing a new email reading system called
notmuch, written by
Carl Worth with comments from me (and a few minor
patches).

Abandon Fail Boat

Almost two months ago, when I updated my debian system to the latest and
greatest bits, I happened to get a new version of evolution, 2.28. As
has become the tradition with new versions of evolution, a few more
things broke.

I've suffered through evolution 'upgrades' several times and
had slowly reduced my usage of evolution features to try and keep it
working. This time, I got stuck. The accumulated bugs in this mailer
made it impossible for me to get my work done any more.

And, yes, it's a sad commentary on the Linux desktop that the most
important feature for many people using Linux has no credible GUI
application (yes, I've tried a lot of email applications; I have too
much mail for them to cope).

Exploring Sup

Carl had given up on Evolution a few weeks before and was using
sup. From his description, and from
a brief bit of experimentation, I decided to give it a try. Sup has
four main features:

It is entirely search based. All messages are indexed by a 'real'
indexing system, xapian which provides
reasonable full text search for email.

You can mark (automatically, or manually) messages with labels;
the 'inbox' view just shows the results of a search for messages
with the 'inbox' label.

It never modifies the actual mail store. All state is stored
inside the database in the form of labels.

Most operations act on threads, not messages. Viewing a thread
shows you the unread messages in the whole thread in a single
page, making following the conversation easy.

This feature set is exactly what I've been trying to get Evolution to
use for several years; I used the virtual folders to automatically
sort mail into several 'catagories'. Unfortunately, the evolution
vfolder support was terrible to start with (way too slow to be
actually useful) and has gotten far worse over time (no more nested
vfolders?).

Sup works quite well for a small amount of email. With my message
store (dating back to 1984), it took "a while" to do the initial scan
of to construct the database. After that, searches are zippy fast.

Sup has a couple of fairly serious mis-features though:

It's written in ruby. Yet another language disaster in my book;
syntax horror-show similar to perl, and a lack of static
typechecking means that obvious bugs in the program wouldn't be
caught until you happened to execute that particular line of
code. Ruby is also no speed demon—I spend a lot of my day reading
email, waiting for ruby is not on my list of desired activities.

It has a magic curses UI. This is actually pretty good for reading
email, but it's not scriptable at all, which is useful for mass
patch-application, and it completely fails when composing new mail
as it forks off emacs and waits for it to complete, meaning that
you cannot see any mail while composing a message.

It saves a bunch of label changes inside the application, and
Xapian saves most of the database changes too. Having sup crash
often means re-viewing a lot of mail.

Carl and I started fixing sup in various ways; making the mime-viewer
run asynchronously (so you could see attachments while viewing the
rest of the message), sorting the inbox oldest first and various other
changes. Nothing serious, but it did show us how sup was built and
just how simple it was inside.

It turns out that sup is just a bit of UI goo over a powerful
full-text database; the complicated code is not the UI but the
database. Of course, the sup UI is great for viewing mail, but that's
fortunately easy to clone.

A Minimal Mail Reader

Having seen just how easy it was to build a really nice mail reading
system, Carl and I sat down and sketched out what the foundations of
our 'ideal' system would look like:

Xapian based. I haven't seen anything close to Xapian in terms of
features or performance. It has only one serious bug—it's written
in C++. Fortunately, we can wrap the C++ mess with a simple C
wrapper and ignore that aspect.

Command line driven. Any UI would be constructed on top of the
command line interface. And, by UI, we mean emacs major mode. If
someone wants to write a GUI, we won't stop them though.

Carl started by playing with Xapian, using the existing sup database;
one possibility would have been to retain compatibility with the sup
format and just provide a new interface. Unfortunately, there were a
lot of 'ruby-isms' in the sup database, and reconstructing that would
have been pretty difficult from a non-ruby application.

Introducing Notmuch: Not much of an email program

Notmuch really isn't much of an email program; it doesn't talk to mail
servers to receive or send mail, it doesn't even really know what
Maildir should look like. All it does is construct a database for all
of your mail messages and allow you to search and show email messages.

Notmuch has two pieces—a C program that uses Xapian to search and tag
mail messages, and an emacs major mode which provides a fairly simple
user interface. Like git, the notmuch C program places a bunch of
commands within a single executable:

dump [filename]
Create a plain-text dump of the tags for each message.

restore filename
Restore the tags from the given dump file (see 'dump').

help [command]
This message, or more detailed help for the named command.

(The above text was taken directly from notmuch itself and was written
by Carl).

As you can see, all of the commands which talk about messages take an
arbitrary search pattern. The search command outputs thread
identifiers in search-term form, so you can easily script things by
pulling that out of the search output and passing it to additional
notmuch commands. Learning how to do searching in notmuch is the key
to using it successfully.

Xapian Search Terms

Matching words anyplace in the message is fairly simple; just list the
set of words you want to match. Notmuch also adds some special syntax
to direct the match at specific header fields:

tag:tag
match messages with the specified tag

thread:thread-id
match messages associated with the specified thread

id:id
match the message with the given id. Message ids are those set by
the message sender in the Message-Id: header field.

from:word
match messages with word in the from address field.

to:word
match messages with word in either the To: or Cc: headers.

attachment:word
match messages with word in an attachment filename.

subject:word
match messages with word in the subject field.

Aside from these additions, notmuch uses standard
Xapian search syntax,
including support for AND, OR etc. Xapian's query parser is not the
most robust piece of code though, so sometimes you need to mess with
the query to get it to do what you want.

Notmuch emacs mode

There are a lot of email clients available for emacs; notmuch adds
only the email reading part and uses the existing 'message' module for
composing and sending mail. Even still, notmuch.el is almost 1000
lines long. It offers two different modes -- the search display, where
a list of email threads are presented, and the thread display, where a
single thread is displayed.

The search display presents the output of 'notmuch search' in a
window, eliding the thread id. When a thread is selected, a thread
display buffer is constructed with the thread contents as formatted by
'notmuch show'.

'notmuch show' structures the thread to make the display more useful
in emacs; it splits messages into headers and bodies and marks the
thread depth of each message. The header of each message will be
shrunk to a single line (in reverse video). Previously read portions
of the thread will be hidden by default, along with signature lines,
quotations and attachments. Each of these can be viewed by use of a
suitable command. Carl stole much of this from Sup and adopted it for
use inside emacs, along with some of the key bindings.

How well does it work right now?

Frankly, notmuch is pretty rough today; I'm using it to read email,
but I'm finding lots of stuff to fix. Fortunately, most of the fixes
are pretty simple at this point. The good news is that it's plenty
fast, fast enough that I can count how many threads I've exchanged
with my good friend Bart in the past 25 years (2686) in only a few
seconds.

The biggest performance issue is some lazy code within Xapian. When
you want to change the set of tags related to a document in the
database (a single mail message), Xapian replaces the entire
document. Try removing the 'inbox' tag from half a million messages
and Xapian will carefully rewrite 5GB of data. That takes a while. The
Xapian developers have suggested that this shouldn't be hard to fix
though, at which point re-tagging messages should get a lot faster.

For those interested in playing along, the notmuch sources are
available from the notmuch web site along with
a pointer to the mailing list.