I'm using 3.4, and have the junk mail filters pretty much standard(non-standard bayesian checked), and I go through the process of declassifying mail that's marked incorrectly, then marking it with it's correct distinction(i.e.~junk mail/good mail), but it doesn't seem to be learning very well.

I get topic reply notifications from one of my sites, and no matter how many times I unmark them as junk and mark them as good, the next one to come into the box gets marked as junk, and they're identical to the last one. This has probably happened 50 times now.

I've searched the forum and found threads with instructions like " declassify 15 times, then classify 15 times...." but this makes absolutely no sense to me.

Is there a simple way to help the filter system learn better? Right now, it's useless, as I'm still manually marking junk mail and unmarking good mail more often than not, but I don't want to install a third party app, because they seem to wreak havoc on PM, creating blank emails, etc...

First check where the junk mail filters are with respect to your regular incoming filters. They should be at or near the bottom of the list. This way other filters have a chance to act on the message and move it before the junk mail filters take hold. This should help eliminate false positives (a good message classified as junk).

Since you are also using the non-standard bayesian filters what do you have the junk score and good scores set to? These values may be totally negating the non-standard filters.

My experience has been that the standard non-bayesian filters, combined with some filters I have created, and with some banned words, do a reasonably good job - 86% rating, with almost no good emails misclassified as junk.

Now that I have figured out how to get Bayesian filters to process, I have discovered that my training approach has been worthless. I can see that Bayesian filtering is categorizing almost all of my email as good, despite having been trained on thousands of good and bad words. (I am talking about how Bayesian filtering categorizes email, distinguishing it from how these are scored) Particularly entertaining is that today I received three identical spams, which differed only in the From field. Bayesian filters concluded that two were good and one was bad. ?

So, clearly, I need to do better training. But I am increasingly cautious about this, because so many spams add in some text that could easily appear in any good message. So if I train on these messages, I suspect that I am undermining the value of all of prior training; and it seems reasonable to guess that such messages can skip right past Bayesian filters that have learned a lot of good words.

So I have the same question you do - what is the best way to train - plus an add-on - how much does Bayesian filtering add to the standard non-Bayesian filtering plus a handful of other filters and banned words?

I am using a combination of BF, JMF and custom filters and scripts. I am starting to see more messages make it through this but haven't been keeping track. I will start to keep some stats on this and see what I discover.

I decided to try Thunderbird, to see how its built-in "adaptive filtering" compares to PM's. Running Thunderbird first, and leaving email on the server, and using none of my own filters, Thunderbird has failed to catch one spam. And I have trained it only on its mistakes, over a two day period.

On the same emails, PM's bayesian filtering has failed to classify several spams, despite or because of the training I have done. Particularly discouraging have been the numerous "Dear Homeowner" refinance offers that PM has missed - all look the same except for sender, and I have "trained" PM that these are spams.

I cannot decide whether to just turn off bayesian filtering, or to try and retrain it. Like you, I haven't seen a clear and concise description of how to train well.

What you're finding out about Poco's Bayesian filter has been true every since it's been around. It's a really "retarded" sort of learner.

I have tried time and again to follow any instructions or suggestions on these forums for training the thing and simply cannot get it to learn and work effectively.

For the sake of speed and ease in downloading my email, I generally have given up on Spam filtering and run Pocomail with what little it will catch.

From time to time, I reinstall K9 as simply shut off Pocomail's Junk filtering functions altogether. No matter the somewhat slower speed in downloading email -- with K9, I get almost perfect Spam filtering.

Alas, it doesn't seem to matter how often we struggle to understand and/or train Poco, the Junk mail filtering, at least the Bayesian filter part, is simply WILDLY deficient compared to the builtin Spam filtering of Thunderbird or Mozilla Mail and such third-party software as K9.

And, yes, a truly step-by-step, here's-how tutorial for Junk mail filtering using Pocomail WOULD be a good idea. I don't mean one of those percentages, headers, sort of techno-babble things; I mean something by someone knowledgable doing a "Junk Mail in Poco for Dummies" sort of tutorial.

so I guess between the replies here and what I've read elsewhere on this forum, I'll just leave PM's filter system alone, as it has so far proven itself to be useless.

I came over from Thunderbird, and before that, was using Spam Inspector with OE, and both were very successful from the start, Thunderbird only dropping the ball once or twice in a blue moon, and Spam Inspector doing very good as well. PM on the other hand, as someone else has noted will miss THE EXACT SAME EMAIL with only a different sender. I can't imagine being able to teach a system that does that anything useful

Looking at the good and bad word files, I wonder about what PM is learning. Why, for example, is it useful for PM to have learned words like "X-ORIGINALARRIVALTIME-07=1" or "DATE-1000=1" or "RECEIVED-1conyv3sp3nzfpa0=1" or "X-SYMANTEC-TIMEOUTPROTECTION-0=1"

Last edited by seabc on Mon Apr 25, 2005 5:51 am, edited 1 time in total.

In general, the PM bayesian filters, like all bayesian filters, will examine almost every token (words, fonts, strings, etc) that is present in the email and categorize the tokens appropriately. When using what it has learned, PM only looks at the (I believe) 30 most meaningful tokens that are truly indicative of whether it is junk or not, so having PM learn the meaningless tokens (arrivaltime, etc) probably does not harm, and simplifies the programming, I'm sure.

The junk mail filtering in Myinfotogo is a joke. When I used Thunderbird, I rarely received junk email in my in box.

With Myinfotogo the junk mail fills up my inbox like crazy. Even the training feature doesn't help. The only reason I am keeping this is because it is the only combined email/PIM that I can find for my U3.

Pocosystems, you need to fix the junk mail filtering. It is total **** right now.

When will you be coming out with email filtering on this program that actually does what it is supposed to do?

It actually does what it's supposed to do. For some it will work without much tweaking, for others it won't work great.

If you fixed that problem, this application would be a real winner. Right now it is barely usable.

I agree that the Bayesian filter needs improvement, but to say it's barely usable is simply untrue. At least it works for me and always did. I don't use a spam blocker anymore and until now I get some false positives from time to time, but all real junk is transferred to Junk.

When I used Thunderbird it caught a lot of spam right from the start, Myinfotogo can't even do it when it is trained to do it.

I know its spam filter is quite good, but that's all. For the rest I can't judge the app, since I've even never tried it.
Tried a lot of others before, although their junk mail filtering isn't always good. The only one who filtered best was Bloomba with SAProxy. Unfortunately taken over by Yahoo, because of their excellent search engine.

Hope the Junk Mail filtering will be adressed during a next upcoming beta, so it can be improved.