TaQuilla

TaQuilla enables “soft tags” on Thunderbird, which are tags set by Bayesian analysis of the message content. Once you enable soft tagging of a particular tag, and train a few messages, then the tag is applied automatically to future messages.

This description applied to revision 0.3.0 Notes on different releases can be found at the TaQuilla revision page.

TaQuilla Setup Screen

To setup soft tagging, open the options page for TaQuilla by going to Tools/addons/TaQuilla/Options. You’ll see a setup page that looks something like this:

These fields are described below.

Each of your existing tags is listed, with the following options per tag.

Enabled: You can enable soft tagging for a particular tag by checking in the “Enabled” column.

Analyze all folders: The “Analyze all folders” column is used to set whether, by default, soft tags are analyzed automatically for that particular tag in all folders. In most cases, you will want to only analyze soft tags in specific folders. If so, leave that option to the default value of not checked. But if you want to analyze a particular soft tag by default in all folders, you could set that option here.

Percent: After a tag is enabled, then all new messages will be automatically scored by the bayesian filter with a score of 0 – 100, depending on its estimate of how closely the message matches messages that you have previously tagged. (Only messages that are manually tagged by you after soft tagging is enabled for a particular tag are used in the training. Any previous manually applied tags are not lost, but they are not used for training either.) The column “Percents” shows the cutoff point used by the filter in deciding whether to apply the tag or not. I’ve found that the default of 50% usually works just fine for most cases.

Tag set key, Tag clear key:The checkbox “Use set/clear instead of toggle for tagging keys” allows a global override to the default behavior for keyboard shortcuts for tagging. By default in Thunderbird and SeaMonkey, the keys 1 – 9 are used to toggle the tagging of specific messages for the first 9 defined tags. But in training of messages for soft tagging, that is not always the most convenient behavior, as it can be useful to train a message independently of changing its tag. If you enable this checkbox, then the behavior of the application will change, and there will be separate keys for setting and clearing of tags.

If you set “Use set/clear instead of toggle for tagging key” then the number keys 1-9 are no longer used to toggle the tagging of messages. Instead, you must assign separate keys for tagging and untagging. To do that, set your cursor to either the “Tag set key” column for a tag, or “Tag clear key” column. Then press the key that you want to use to set or clear a tag, including any modifiers such as shift, control, or alt. Depending on your platform and keyboard, different combinations of keys may be available. You’ll need to test to see which keys you can use for setting and clearing. (On Windows, I recommend that you just use 1-9 for the set keys, and alt-shift 1-9 for clear, though instead of 1-9 you will see the special character corresponding to the shift for that number key).

Overview of operation

To get started with a new tag, you need to enable a specific tag for soft tagging both globally, and for particular folders. After enabling specific soft tags globally in the setup screen as described above, select a particular folder whose messages should have soft tags applied. Right click the folder, and select “properties”. TaQuilla extends the normal folder properties dialog so that it can be used to enable the calculation of particular soft tags.

As an example, I enabled soft tags for the tags “Important” and “Personal”. Then the folder properties dialog looks like this:

The “Enabled” column shows whether soft tagging is enabled in the folder for a particular tag.

The “Inherit” column shows whether to accept the default value for the folder, or whether the value should be set separately for this folder. If “Inherit” is set, then the “enabled” setting is the same as the folder’s parent. If the parent is a root folder, then the value is set by the “Analyze all folders” setting for the tag in the TaQuilla setup screen. If you want to set “Enabled” for a particular folder, you must first clear “Inherit” for the folder, then set the “Enabled” to the desired value.

In the shown example, “Important” is inheriting from its parent and is not enabled; “Personal” is not inheriting, and is enabled.

Then you need to do some training. For the bayes filter to work, messages must be trained by you to give the filter examples of messages that match the tag, and messages that don’t match the tag. Training occurs whenever you manually change the tag on a message, either by adding a tag, or removing an existing tag.

If you have enabled the option in the TaQuilla setup screen “Use set/clear instead of toggle for tagging key”, then when you use the separate keyboard shortcuts for setting or clearing tags, training will occur even if the tag does not change. This is the recommended way to do training, but it requires that you change the default behavior of the keyboard shortcuts, which may not be desireable in your environment.

Initially, after enabling the tag, find a few messages that should have the tag applied, but don’t. Tag those manually. Also find some messages that should NOT have the tag applied, and train those. Let the bayes filter analyze a few existing messages by selecting some messages, right click on the selected messages, then select TaQuilla/Calculate soft tags. After TaQuilla recalculates soft tags, select some messages that have the wrong tag applied, and train them correctly. Repeat this a few times until the bayes filter is starting to correctly classify most of your messages. In the future, you should only need to train when the filter makes a mistake.

You can tell the filter to analyze existing messages in two ways. First, the tools menu will have a new item, “Taquilla: Calculate Soft Tags for Folder”. This will analyze all of the messages in the currently selected folder. Second, you can analyze only selected messages by selecting the menu item “TaQuilla -> Calculate Soft Tags” from the mail context menu. That menu also has another option, “TaQuilla -> Details” which will show the tokens used by the bayes filter, the percent match of each token to the tag, and a running total of the calculated percent match of the message using all of the tags.

It can be useful to see, on an individual message basis, how the soft tag algorithm is working. For any folder that has soft tags enabled, you should be able to enable the display of columns that have information on who set the tags for a particular message, and what was the percent score that the bayes filter calculated for each tag and message. Those two columns will have names consisting of a single character, followed by a “%” for the percent match, and a “?” for the column showing who set the tag. The single character is the first letter of the tag’s name. So for example if your tag name is “Interesting”, then you will see a column “I%” with the percent match to “Interesting”, and “I?” with the status of who set the tag for that message. The status will show a greek letter sigma for messages that have been analyzed for the tag by the bayes filter, and a check mark for message that have been set manually by the user. That lets you see which messages have had the tag applied directly by you, and which ones have been applied automatically by the filter.

That’s it! After a little training, you can sit back and let the computer automatically tag future messages!