At Pycon, I was talking with Glyph and others about the email parser in
Python 2.3. Anthony Baxter, Thomas Wouters and I were having a little
email-sig sprint, and we all agreed about the major problems with the
current email parser.
- It can throw exceptions parsing some messages. These exceptions can
be difficult to handle.
- You must slurp the entire message into memory before you can start
parsing it.
Over in the email-sig we've been talking and working on a new parser,
called the FeedParser which eliminates both of these problems. This
parser also has the advantage of being much more RFC compliant, IMO
<2046 wink>. In fact, we now have a new FeedParser.py in Python 2.4cvs
(slated to be email 3.0) which I think does a very good job of parsing
all manner of valid and invalid emails.
The old email.Parser.Parser interface continues to exist for backward
compatibility. The docs have not been updated yet, but the unit tests
have. Note that the FeedParser, if it encounters broken MIME, will add
'defects' to a message object and continue on as best it can. You can
check the message's .defects attribute; if it exists it will be a list
of instances providing more information about what type(s) of defects
were encounter.
To use it, you instantiate an email.FeedParser.FeedParser and
continually call its .feed() method, which takes a single argument of
arbitrary length string data. The data need not be a complete line,
although the FeedParser will split it into lines (using any of the three
common line endings), gulping input a line at a time. Internally, the
parsing routines are generators that yield when they need more data
(feed() itself just returns). When you've feed it all the data there's
ever going to be, you call .close() on the parser; the rest of the data
is consumed and you get back the root email object.
Because I think we're largely done with the FeedParser[1], and because
some of the Twisted guys were interested in this stuff, I'm sending this
message so you can grab the new parser and see if it's going to fit the
bill. For now, you'll have to get it out of Python's cvs, but at some
point when we've addressed the other issues in the email package, we'll
make a distutils release.
Note that email 3.0 will be compatible with Python 2.3 but nothing
earlier. Please follow up with any discussions to email-sig at python.org.
Enjoy,
-Barry
[1] Although see these messages for open issues:
http://mail.python.org/pipermail/email-sig/2004-May/000114.htmlhttp://mail.python.org/pipermail/email-sig/2004-May/000118.html