Better bulk filtering for Gmail

I use Gmail extensively for my personal email, and recently my
workplace has been migrated over to Gmail as well. I find that for my
work email I rely much more extensively on filters and labels to
organize things (like zillions of internal and upstream mailing
lists), and that has posed some challenges. While Gmail is in general
fairly snappy, attempting to apply an action to thousands of messages
(for example, trying to mark 16000 messages as “read”, or applying a
new filter to all your existing messages) results in a very poor
experience: it is not possible to interact with Gmail (in the same
tab) while the action is running, and frequently actions will timeout.

Fortunately, we can take advantage of Gmail’s IMAP interface to
overcome most of these obstacles. The naive approach won’t work: If
you attempt to perform an IMAP action against thousands of messages
you will encounter the same timeouts you see with the browser. The
big advantage to IMAP is that it makes it easy, with a little bit of
code, to split a large operation up into smaller chunks. Gmail
provides a few [IMAP extensions][] that provide us with a mechanism
for accessing Gmail-specific features, such as the rich search syntax
and support for arbitrary labels.

I’ve written a small tool to take care of this; you can find it in my
[gmailfilters][] repository on GitHub. The project provides two
commands, the gmf bulk-filter command, which I will discuss here,
and the gmf manage-filters command, which can translate between a
simpler YAML syntax and the XML syntax used by Gmail’s filter
import/export. I may write about that in a future post.

Installing gmf

The gmailfilters project is a standard Python package. You can
install it directly from GitHub like this:

pip install git+https://github.com/larsks/gmailfilters

This will install the gmf command, which provides the following
subcommands:

bulk-filter – a command line tool for applying bulk actions to
Gmail messages.

manage-filters – translate between a YAML filter syntax and the
XML syntax used by Gmail for filter import/export.

Configuring gmf

Once installed, you will need to create a configuration file. By
default, gmf will look for a file named gmailfilters.yml located
in your current directory or in your $XDG_CONFIG_HOME directory,
typically $HOME/.config. The configuration file looks something
like this:

You can have multiple accounts in the file; gmf will use the one
named default by default.

Using the bulk-filter command

I have a Gmail filter that applies the label topic/containers to any
mail matching the search {docker container kubernetes lxc runc}. I
want to apply this filter to all my existing messages, and I want to
mark all matching messages as read so that I can identity new matches.
I can use the bulk-filter tool to accomplish this task using the
--label and --flag options:

It turns out I had close to 15000 freecycle messages gathering dust in
my account. The messages were already labeled with the freecycle
label. We can weed those out like this:

gmf bulk-filter –query ‘label:freecycle’ –trash

The --trash option acts like Gmail’s “move to trash” option. You
can also use --delete, which will use an IMAP delete operation, but
the behavior of an IMAP delete is controlled by your Gmail
configuration (it may simply archive a message, or it may delete it
completely).

The bulk-filter tool can also be used to remove labels by preceding
them with a -. For example, if I want to find all messages labeled
fedora-devel-list and modify them so that they are labeled list,
list/fedora, and list/fedora/devel I can run:

This exposes a quirk of Python’s argparse argument parser: if the
argument to an option starts with -, argparse assumes that you’ve
made a mistake unless you explicitly attach it to the option with =.

By default, the bulk-filter tool operates on the [Gmail]/All mail
folder, which contains all of your messages. You can limit it to
specific folders instead by providing an optional list of folders
(that may contain wildcards). For example, if I want to perform the
above labelling operation only on internal company mailing lists, I
could limit it like this: