Mailing list archivers

I'm getting ready to do a project that involves setting up Mailman
lists and archives and a simplified web interface and whatnot. Seems
like basic stuff, though we've investigated it enough so far not to
expect it to be basic. But I've looked through Mailman source in
the past, and it's not that hard to grok.

But what's really starting to make me wonder is the archiving -- I
can't find any good archivers. Everyone seems to agree that
Pipermail is obsolete. MHonarc is still alive, though it doesn't
seem to be lively, and Hypermail is... I don't know what Hypermail
is. Looks just like Pipermail to me. They all produce rather boring
static HTML pages, without a whole lot of added value. And what's up
with MailBoxer? I'm a little nervous about an all-in-one package
(and Zope, and DTML).

I must be missing something. Don't people want nice mail archives? I
know this isn't rocket science, so why hasn't someone done something
cool? I'm starting to feel like I should just code something myself
-- the email module does the work I'd be scared to do (parsing
real-world email messages), and I could use dbmail (if I fully
trusted a big database archive, which I'm unsure of). For search I'm
still uncertain, but I think PyLucene is probably easier to build and
use than the first time I tried it, and everyone seems to like its
results.

Maybe I should look at this as an "opportunity". But dear lazyweb, I
am very willing to ride on other people's coat tails, just tell me
whose...

Created 04 Apr '05

Comments:

I wrote a mailing list archive thing for the css-discuss list in PHP a few years ago, but the source was never released. It took the best part of 12 hours - they really aren't very complicated pieces of software. In Python it would be even easier thanks not only to the email module but also to AMK's threading module: http://www.amk.ca/python/code/jwz

I think there might not be an obvious consensus about what features
are desirable for this purpose. For me, I wanted threaded presentation,
but no searching-- I'm satisfied to let Google take care of that.

Instead of PyLucene I would suggest Xapian (http://www.xapian.org). The indexer library is written in C++, is damn fast, scales well and has nice Python bindings.
I have written a small desktop search app in only one evening.

I just tried Xapian out -- builds easily, looks like fairly straight-forward Python bindings, and okay documentation. Very interesting. PyLucene still drives me nuts to build, so it's just not worth pursuing at the moment, and I'm not even sure if it offers anything over Xapian. I'd looked at Swish-E quite a bit before, but Xapian looks a little more modern and with better-defined layers.

Aside from Mailboxer you could also look into PlonePostOffice. It may not be what you want (being a Zope/Plone product) but at least it does not use DTML ;-) The largest difference with Mailboxer is that it stores the email as content in the ZODB. This enables the normal Zope catalog and Plone workflow to do interesting things with it.

You wrote: "I'm starting to feel like I should just code something myself". It's exactly my feeling some time ago! And I also managed to stay lazy.

But the big differences in our approaches are technical details. I'd like to introduce MailML, a format for specifying the structure in mail messages. The possible logical blocks are titles, attachments, cites, signatures, urls, addresses and other. The MailML documents can be stored in XML database and queried through the web, something like Syntaco (http://www.syncato.org/). It should be an ultimately powerful system.

I sure it's a business opportunity. But I don't have time to try it. Unfortunately.

IIRC, someone (was it AMK?) wanted to do a whole new archive web interface as well.
I don't think it got real, though.

I am interested in a good mailman archiver as i am still looking for one for
codespeak's mailing lists. However, i probably want something where i can
stuff in source code, documentation, IRC-logs and mailing-list archives
and have a nice search interface for all that (displaying search results
in content-specific ways but just so that it reads nicely).

Thanks Meng -- I hadn't seen that, but looking at it I'm quite impressed. The C++ underpinnings scare me a little -- C++ web applications seem weird -- but I expect the XSL layer will give me the flexibility I need. And the actual featureset looks great, so I think we'll end up using that. (I still like Xapian/Omega, so I may look into using that for website searches)