gloda should listen for folder addition so that we can map a folder before it has messages in it. When we map folders and they already have a message in it, we mark it filthy and fail to schedule an indexing sweep, so the situation is not resolved until restart or overflow triggers a sweep. This works out poorly since we frequently only hear about folders just after they have had their first message added to them. Using shouldIndexFolder should invoke _mapFolder only after the appropriate filtering. Thanks to Han Lin for helping determine the existence of this bug.
The rather sucky fallout is that gloda will not properly index newly added accounts until after Thunderbird is restarted or the indexer goes into overload mode. I can't confidently state the probability of overload mode happening, but it seems pretty likely to me for an account with more than a few messages unless the computer is much more powerful than its network connection.
I had a longer write-up before before with more analysis, but standard SSL form lossage on restart got me. (I thought I had the pref changed, but maybe not, or maybe bartab and multiple restarts got me.)

Bug 547877 was where we introduced the initial logic that supported having the folders be clean if empty (rather than always filthy), and where bienvenu indicated that doing so was a reasonable strategy.
It added a unit test that validates the safety of this general approach.
I am not planning to provide a unit test for this because:
1) The aforementioned unit test ensures the correctness of the clean-logic already.
2) shouldIndexFolder is known safe from its usage in all our other event-handlers and production.
3) This is really unlikely to regress given that it's hung against the supported/preferred event-notification system and that any breakage in shouldIndexFolder would be detected by other gloda tests.
4) I think the work required to add the test would be disproportionate to the amount of benefit given the above safety and I am swamped with B2G e-mail work right now and I think the effort would be better spent on the B2G case.
Having said that, I think in an ideal world we would have a test that duplicates the pre-existing account situation because it's obviously not cool that this problem existed and it would be a lame (but highly unlikely) regression should this ever regress.

This breaks tests related to junk-folder processing and possibly trash-folder processing. At first glance, the problem is the folderAdded notification is synchronously dispatched before messageInjection.js gets around to calling setFlag to make. I don't see a way to create a folder with its flags already set, so this seems like a problem that would affect actual implementations.
Here is our code that creates folders and set flags:
http://mxr.mozilla.org/comm-central/source/mailnews/test/resources/messageInjection.js#286
Setting flags calls through to NotifyIntPropertyChanged which calls OnFlagChanged which calls OnItemIntPropertyChanged:
http://mxr.mozilla.org/comm-central/source/mailnews/base/util/nsMsgDBFolder.cpp#4902
It seems like the correct solution is for GlodaMsgIndexer to sensitize its listener for onItemIntPropertyChanged notifications and whenever a folder has a flag changed that is known to affect the gloda-indexing-status of a folder, to call GlodaMsgIndexer.resetFolderIndexingPriority on the folder. We expect these flags to change rather rarely and primarily just at startup or otherwise in reaction to user actions, so clobbering user settings is not deemed to be a major risk.
I've made this change and pushed to try:
https://tbpl.mozilla.org/?tree=Thunderbird-Try&rev=3cb566d35581

This brings up a related issue, that may or may not affect what happens here.
In extension-added new account types, it is quite difficult to control startup. Currently (bug 592710) any time that accounts are initialized but the extension is not loaded, the account manager deletes that offending extension-added account. Extensions have to go through elaborate, unreliable workarounds to get around this.
The fix proposed in bug 592710 partially alleviates this. Unfortunately there is no way to control the timing of when the account manager loads accounts even with this fix. Ideally all extensions would be loaded before accounts are initialized, but the simple act of one extension (such as Conversations) loading the gloda modules before another extension forces accounts initialization before the extension-added account can be recognized.
So what new account type extensions are going toward is for their own startup to include the calls:
accountmanager.UnloadAccounts();
accountmanager.LoadAccounts();
as part of their initialization. From gloda's perspective, it can already be initialized, then someone unloads and reloads all accounts. I doubt if that is a well handled scenario.
From this bug's perspective, keep in mind that there can be accounts that have existed (and presumably been indexed), but are not seen at initialization. They can be added later, but are not really "newly created" just "newly loaded".

The UnloadAccounts/LoadAccounts hack sounds pretty bad. It seems like having the account manager instantiate things via the category manager might work? Specifically, I am assuming the system parses all of the manifests which establish component and category mappings before it starts executing any of them, which should allow for avoiding these type of problems.

"It seems like having the account manager instantiate things via the category manager might work?"
Too many places in startup (including extension startup) currently assume that accounts can be initialized at any time. I fear that any attempt to delay initialization until everything is loaded, then asking all account consumers to listed to some accounts-loaded notification, is just too complex to expect.
What is probably more feasible is to add something like an account loaded/unloaded flag per account, and then allow individual accounts to be loaded separately (with proper notifications to people like gloda) by extensions.

Regarding Conversations loading gloda modules too early, I think the latest version loads them slightly later. Since Conversations is now restartless, its startup code kicks in after the "load" event fired for the mail:3pane, so it may be late enough for you to perform your business before I actually trigger the whole process by doing Cu.import on gloda modules.
Re SSL form lossage, I heard lazarus is pretty good at this :).
Reading your comments, what seems to me like another solution is to actually perform the scheduled indexing sweep after we first figure out about the folder, via its first message. Could you please elaborate as to why you didn't pick that strategy? (I guess it's because the performance hit would be too high and we would be scheduling indexing sweeps excessively?)
I'm not unsure I deeply understand the discussion about account types, but:
> Too many places in startup (including extension startup) currently assume that accounts can be initialized at any time
I think what Andrew suggested is that the core code obtain the list of all account providers, and ask them in turn to initialize themselves. Doing this via the category manager would make sure this can be performed via XPCOM components, and before the extension startup. That way, extensions would access a list of accounts that won't change anymore.

Comment on attachment 653924[details][diff][review]
v1 have gloda listen to folderAdded and folder flag changes
Review of attachment 653924[details][diff][review]:
-----------------------------------------------------------------
I haven't actually tested this, but in try server I trust!
::: mailnews/db/gloda/modules/index_msg.js
@@ +2566,5 @@
> +
> + /**
> + * Detect newly added folders before they get messages so we map them before
> + * they get any messages added to them. If we only hear about them after
> + * they get their 1st message, then we will mark them filthy, but if we mark
... if we *map*?
@@ +2829,5 @@
> + if (aProperty !== this._kFolderFlagAtom)
> + return;
> + if (!GlodaMsgIndexer.shouldIndexFolder(aFolderItem))
> + return;
> + GlodaMsgIndexer.resetFolderIndexingPriority(aFolderItem);
I tend to think that
if (aProperty === this._kFolderFlagAtom && GlodaMsgIndexer.shouldIndexFolder(aFolderItem))
would be slightly clearer.

(In reply to Jonathan Protzenko [:protz] from comment #8)
> Reading your comments, what seems to me like another solution is to actually
> perform the scheduled indexing sweep after we first figure out about the
> folder, via its first message. Could you please elaborate as to why you
> didn't pick that strategy? (I guess it's because the performance hit would
> be too high and we would be scheduling indexing sweeps excessively?)
Good eye, that is indeed another viable option that disappeared with my original comment.
The reasons, some of which I am only thinking about now, are (:
1) This way feels more correct. If it's a completely new folder and it has no messages, then it really is a clean folder. We just never learned about it in a timely fashion before.
2) This way is potentially much more timely. An indexing sweep could get hung up on other folders that had been marked dirty because of event-driven indexing, causing it to fall significantly behind (and fight the folder that is being populated in terms of I/O and other resources.)
3) Triggering a sweep seemed more complex; I wanted to avoid thinking about the complexity, dealing with potential unit test breakage, and potentially needing to write a new unit test :)
You are right (clearer) and demorgan's law says the change is correct. Just an example of me going overkill on the mozilla trick of avoiding indentation by kicking control flow into oblivion.

(In reply to Jonathan Protzenko [:protz] from comment #9)
> ... if we *map*?
GlodaDatastore._mapFolder is the underlying function that does the work. You make a good point about that being confusing if you're not aware of that fact.

(In reply to Andrew Sutherland (:asuth) from comment #11)
> (In reply to Jonathan Protzenko [:protz] from comment #9)
> > ... if we *map*?
>
> GlodaDatastore._mapFolder is the underlying function that does the work.
> You make a good point about that being confusing if you're not aware of that
> fact.
er, and now I realize you were correcting my use of the wrong word. :) Thanks for the typo-fix!

protz, does my rationale in comment 10 seem reasonable? I want to make sure you're on board with my solution before landing it. I am also willing to file a follow-on bug (that I probably won't fix anytime soon) if you think we should add that indexing sweep logic.