Login

Python Email Libraries, part 2: IMAP

The first article in this series discussed how to access a POP3 server with a Python script. While that protocol is useful for learning the basics of how email works, IMAP is the protocol most used today. This article covers this more complicated protocol.

Introduction

In the previous article from this series, I described how to access a POP3 server with a Python script. While POP is a useful protocol for connecting to legacy servers and gaining a basic understanding of how email works, IMAP is the workhorse protocol for email today. IMAP is designed for users to store their email remotely on the server, rather than locally on their own server. It also allows for users to create a folder system in their inbox to organize and move messages around the mailbox. This all makes IMAP a much more complicated protocol than POP, which in turn translates to a more complicated library in Python.

Much of the IMAP library in Python is very closely tied to the actual IMAP protocol definition in RFC 2060. When you are working with this library, both the Python docs and that RFC are necessary references, as very often arguments to functions will need to be formatted as this RFC states, and responses will be returned in the format defined by that RFC.

Connecting to the Server

Again, the first task you’ll need to accomplish when working with an IMAP server is creating the IMAP object to communicate with the server:

from imaplib import *

…

server = IMAP4(“test.com”)

The objects necessary to work with IMAP servers are held in the “imaplib” library so it must be imported first. The library itself also contains a few other useful objects. First of all, there are two exception classes, “error” and “abort.” The “error” exception generally denotes some sort of failure on the client’s part, and the reason for that error is passed along with the exception. The “abort” exception usually results from some server-side error, and the connection must usually be reset to recover from this error. In addition, “imaplib” also contains several functions that can come in handy, for converting dates into the format used by IMAP and for parsing some IMAP server responses.

{mospagebreak title=Authentication}

Once you have successfully connected to the server, you need to log in to do anything useful. The function to do this is called “login,” and it takes two arguments, the username and password. Once you have logged in, you have several choices as to what you want to do. Since IMAP organizes mail into various “mailboxes” (which are basically folders), you can move to another mailbox, create or delete a mailbox, and fetch various parts of messages.

server.login(“test”, “testpass”)

mboxes = server.list()[1]

server.create(“Old Mail”)

r = server.select(“Old Mail”)

The above code logs in to the mail server and then does some work with mailboxes. The second line gets a list of mailboxes from the server. This response gives you a Python list data structure of the server’s responses to the “list” request. As with the POP3 library, the first element of these returned results is the server’s “response,” which lets you know if the command completed successfully; since we are interested primarily in the data from this function, we strip the first element away.

The “list” function returns a list of strings. Each of these strings has three elements. The first part of the string contains the mailbox attributes flag. This gives you some basic information about the mailbox, such as whether the server thinks it’s interesting or not. The next element is where this mailbox lives, basically its parent directory. This is useful when setting up a nested folder system on the email server, as you must supply the full path to a new mailbox in order to create it within another mailbox.

Finally, the last element is the folder name itself. If you want to find out how many messages there are in a mailbox, you must then select that mailbox. The server will return the result of the operation as well as the number of messages in the mailbox. It is important to know that you must select some mailbox before you can do any actual searching or fetching of messages. By default, the “select” function selects the “Inbox,” but it must be called before anything else can be done with individual messages.

{mospagebreak title=Searching for Messages}

Once you have connected and entered the correct mailbox, you will want to search for specific messages. You can do this by downloading the pertinent information for all messages in the mailbox, and then processing all of that data locally. However, that is typically not necessary. IMAP defines a very useful search function that allows the server to do this for you. This is important for several obvious reasons. First of all, it greatly reduces the amount of network traffic sent, thus speeding up your program. It also reduces the amount of logic you have to code directly into your system.

There are many available attributes for you to search on. For instance, you can search for only new messages, messages of a certain size, messages that are from a particular person or email address, or messages that contain a specific string in the body. In general, all of the search criteria are strung together with ANDs, but it is possible to use the prefix operator OR to make an “or” condition. Some examples look like:

r, data = server.search(None, “(FROM ”fred”)”)

r, data = server.search(None, “(SMALLER 20000)”)

r, data = server.search(None, “(NEW)”)

r, data = server.search(None, “(OR NEW SMALLER 20000)”)

These examples should be relatively obvious. The first searches for all messages that have “fred” in the “FROM” header. The second searches for all messages smaller than 20kb (imaplib uses raw bit octets, so multiplying by 1000 is necessary), and the third searches for all messages that have not been viewed yet. Finally, the last searches for all messages that are either new or smaller than 20kb.

Notice that the “OR” operator is a prefix, and must be put before the two search keys that are being operated on; this is also the case for the “NOT” operator. Another thing to note in this example is the strategy of putting a tuple on the left side of the equals sign, thus allowing you to break the return value immediately into the response part and the data part. This is somewhat different from the response processing shown above, but allows you to easily grab both the response and data.

The data portion of the returned information contains a list of the message sequence numbers that satisfy the search criteria. In all of the above statements, the first argument is something called a “charset” argument. It defines the sort of encoding that the messages should have. Usually, this is used for defining different character encoding standards to deal with different languages’ alphabets. Here, we use the “None” arguments for simplicity, assuming that all messages use the same English encoding.

{mospagebreak title=Fetching Message Information}

Getting message information from the server is the most common task with IMAP, and it is also one of the more complicated ones. To get various parts of a message or information about a message, you use the “fetch” method. However, what makes this somewhat confusing is that the programmer must supply the actual IMAP protocol arguments to the “fetch” method, similar to “search.” That is, the programmer must create a string of arguments that get sent verbatim to the server as the actual arguments for that command.

This means that we have now left the sole realm of Python and now are dealing with the actual networking protocol as well, and the structure of the messages sent over the network. Luckily, these arguments can usually be seen as a relatively simple string list of objects or data that the server can return. Some of these arguments have nested sub-arguments, but these are all handled singly, which adds to the complexity.

The “fetch” command takes two arguments, one defining the group of messages to be retrieved and another describing what parts of the messages should be retrieved. The first argument should be a list of message sequence numbers, not UIDs. It is possible to use UIDs, but that requires the use of another command, the “uid” command, which notifies the sever to process the following command using UIDs rather than message sequence numbers.

The first argument can be either an inclusive list giving just the first and last sequence numbers separated by a “:” or it can simply be a comma separated string listing each number. This means that you can take the list from a “search” query, simply replace the spaces in the data string with commas, and use it as an argument to “fetch.” While this may not be the most size efficient practice, it is often much simpler than compressing the list down to ranges of sequence numbers.

The second argument is a string that lists what information you want from the server. There are many different choices here, and the full listing is found in section 6.4.5 of RFC 2060. However, some important ones are:

UID – Gets the UID, which is the best way to identify messages across different connection contexts.

BODY[<part>] – This is the main workhorse. By replacing the “<part>” section, you can access the main text body of a message, attachments, and headers.

INTERNALDATE – Useful for getting the date the message was received.

FLAGS – This gives you the server’s own metadata about the message, like whether it’s been seen yet, or whether it is marked for deletion.

ENVELOPE – This is a quick way to get all of the default RFC822 fields for a particular message parsed into a list of strings of text.

Some examples are useful in these cases:

r, data = server.fetch(‘6’, ‘(UID BODY[TEXT])’)

r, data = server.fetch(‘6’, ‘(UID ENVELOPE)’)

r, data = server.fetch(‘2:5’, ‘(BODY[HEADER.FIELDS (SUBJECT FROM)])

These lines do variously simple or complex things. The first line gets the UID and the body text of message number 6. The second line grabs the envelope and UID for message 6, and the third line gets the newline-delimited Subject and From header fields for messages 2 through 5. The returned data is structured differently depending upon how you requested it; however, it is far easier to look through the returned information and learn it directly than to describe it in words. Suffice it to say that the returned information is a nested set of lists.

Conclusion

Overall, the most powerful abilities of IMAP lie in the capability to create virtual file systems on the remote server and the ability to filter messages before actually downloading their data. The searching capability is priceless when writing an automated utility for email, allowing you to quickly filter down the data. IMAP is a complex but very useful protocol. Before working with the Python libraries, make sure you are well versed in the protocol definition itself, as this will smooth the road when learning how to interact with the server in Python.