I'm not the cleverest person around these parts by far, so I probably am missing something pretty obvious.

But here's my basic question. And I DON'T mean to be cynical or sarcastic with this. But here goes: Why, in very simple terms, does PocoMail's Bayesian filter take so INCREDIBLY long to train, seem to be so very slow at really learning, when I was able to get 1)PopFile, 2) K9, and 3)Thunderbird Bayesian filters trained far better than Poco's and in far, far less the time??

I'm sorry. I truly don't mean to sound like I'm trolling or flaming, or anything else. I'm just extremely frustrated about it all.

I generally use K9. But from time to time I convince myself having PocoMail's builtin Bayesian filtering work would be more elegant. And indeed, it runs noticeably faster than running Poco with K9.

Yet it took me less than 200 or so messages to train K9 to something over 90 percent accuracy.

I started using Thunderbird just today and within 200 or 300 messages, it already seems to be getting over 75 percent.

I have used I don't know how many hundred messages in Poco, with a current good word count of about 13,750 and bad word count of about 18,900 -- and it's limping along at around 40-50 percent.

Presumably the "% accuracy" figures that you are quoting are those reported by the respective applications. Can you instead monitor the number of false negatives and false positives that you are getting - a pain I know but that is a better measure of how effective the respective systems are. i.e. messages downloaded, messages wrongly identified as spam, messages wrongly identified as good, total number of spam messages.

robin wrote:Presumably the "% accuracy" figures that you are quoting are those reported by the respective applications. Can you instead monitor the number of false negatives and false positives that you are getting - a pain I know but that is a better measure of how effective the respective systems are. i.e. messages downloaded, messages wrongly identified as spam, messages wrongly identified as good, total number of spam messages.

Yeah, well. I suppose I just got carried away. My whole point was this: Straight out of the box, K9 and even PopFile were trained and doing a terrific job within a couple of weeks. Poco's Bayesian filter will go day after day repeatedly plopping almost identical spam into my mailbox without seeming to learn a thing.

I really don't think I should have to get into a bunch of testing and refining to make it work, should I? As an "end user," shouldn't a feature be reasonably workable or work reasonably well off the shelf?

As somebody who IS now getting very good results from Poco, and spent A LOT of time on the Bayesian issue, AND played extensively with POPFile, I do have to concur with speerga's comment. . . POPFile, which comes with no pre-defined corpus, does learn incredibly fast and does not appear to be terribly user-sensitive. Poco's BF -- by contrast -- does have a much wider range of user experience it would seem and is much slower on the uptake.

I have a few guesses about internal workings, POPFile's inclusion of what are called psuedo-tokens, etc, but these are all just speculative guesses.