Hi there, I'm new to PocoMail, but the first thing I'm trying to do is get my spam filter up to speed. So far, not much luck, despite importing over 100 spam messages (and thousands of good ones) to train the bays filters with. The most frustrating thing to me, about PocoMail's filters, is how, after classifying a message as Junk, it will still tell me that it would not be considered junk "under any circumstances." Doesn't seem to be 'learning' very quickly if it won't consider user-designated junk mail as junk mail!

Anyway, I have installed this script (thanks very much for writing it!) in the hopes it would help. So far, it seems to have had no effect. I've read all the messages here and *think* I have it set up correctly, but perhaps the bays filter is not working now; how can I tell? When the script works, I get a header like this:

Does that mean it's not working, or timing out? When I see it downloading, some messages take 5-8 seconds, while others download in 1 second. Not sure what's happening here.

More disturbing, the one above (that worked) let the spam through despite being on two spam lists! Is there a way I can set it up so that appearance on ANY list is enough to classify the message as spam/junk and take appropriate action?

Bako wrote:The most frustrating thing to me, about PocoMail's filters, is how, after classifying a message as Junk, it will still tell me that it would not be considered junk "under any circumstances." Doesn't seem to be 'learning' very quickly if it won't consider user-designated junk mail as junk mail!

Yeah it's weird but i think it's working as designed. There's some more info about this somewhere else in the forum but i think the basic idea is that a word only gets counted as spam if it has been seen at least 3 times. So you learn a message as spam and it gives the individual words a score but then those words won't count in the spam score until they have appeared at least times. You can use the Junk Mail Filtering->Junk button to learn it several times and it should finally be considered spam but its probably best to leave it as it is.

Bako wrote:Anyway, I have installed this script (thanks very much for writing it!)

The script is running but it couldn't extract the IP address from the headers. It needs this to check against the DNSBLs to see if the sender is spammy so it didn't time out because it didn't get to that part. If you can post the headers from this email (or PM them to me if you like) then i'll see if the script can be updated.

Bako wrote:When I see it downloading, some messages take 5-8 seconds, while others download in 1 second. Not sure what's happening here.

The 'Received from IP address (unknown)' messages will be instant because there is no checking to do. For other messages it depends on how fast the DNSBLs respond so it could be a second or it could be longer (pot luck really!) and it may be slower if you have other internet activity going on in the background. You can decrease the timeout if you want it to finish sooner (but that means you may not get all results back from the DNSBLs).

Bako wrote:More disturbing, the one above (that worked) let the spam through despite being on two spam lists!

The script adds to the spam score but it's not the script itself that actually puts the email in to your junk mailbox. Go to Junk Mail Filtering->General Settings and see what number the 'Custom Sensitivity' is at (it should appear when you click on the slider). If the final score for the email after running this script and any other tests you have (eg. the Bayes filter) is higher than this value then it will be moved to the junk mail folder. For instance, i have it set at 10, so the score of 15 in your example above would have classified that email above. Hopefully this is the only setting you need to change

Bako wrote:Is there a way I can set it up so that appearance on ANY list is enough to classify the message as spam/junk and take appropriate action?

Sure but be aware you may get more false positives since the DNSBLs aren't always perfect. To do what you want make sure all the scores in the script for each DNSBL are bigger than the 'Custom Sensitivity' i explained above. Then the final score will always classify the message as spam.

So that for me, bl.spamcop.net (with it's score of 15) always classifies a message as spam but the other DNSBLs require at least two entries or help from the Bayes filter to classify an email as spam. Although i have my custom sensitivity set at 10, being on one of the other lists doesn't automatically put the email into my junk box because i have a filter that gives a small negative score to any emails where the sender is in my address book. This is working quite well here with above 99% accuracy. Also, in the code snippet above, i've cut out some of the DNSBLs that didn't trigger very often and added sbl-xbl.spamhaus.org which is a combination of several DNSBLs (saves checking them individually) so you may like to change the script as above and see how you like it.

First of all, let me thank you for taking the time to help me out with this issue. This seems like a very friendly, supportive community, but you're going beyond the call of duty here and I really appreciate it.

Second, I did have my custom level set to 'medium' sensitivity, which is 12 (it's on 10 now) but I think other lists coming up negative, and perhaps other items the Bayesian filter looks at reduced the score from 15 to below 12 (you can see the other lists give it a -3 by themselves). I've changed to your list of servers, which is shorter and includes no -1's for not being on a list -- I will report back on how it works.

Regarding the missing IP addresses, however, this seems to be one of the real culprits. I looked over some recent spam, and I'd say about 1/2 of them didn't have an IP address included, or at least they hadn't gotten the full server treatment from the script. Since you asked, here are two examples, with full headers, that got through all the filters (I've changed my email address only):

Having a university email address means getting lots of non-standard spam geared toward the 'college market,' but the second one was classic spam. If you have any insights about these, I'd be interested to hear them. Also, if you have any advice on setting the Bayesian filter controls, I have them on the defaults.

Thanks again,

Bako

Edit by Eric: removed email addresses - please do not post this because of spambots sweeping the internet

Ah, thanks for letting me know. Yes, looks like the Bayes filter isn't working. Didn't other people have that problem using this script? I have this script first, then the junk mail filter -- it looks like the filter is working, from all the 'x-poco-score' lines, but no %BAYES% since before I started with your script. The 'run standard and Bayesian filters' box is checked (and it worked before), and I'm using the default Bayesian settings. Any suggestions?

Yeah, that's on. It was working before (and I haven't changed anything except adding this filter) -- perhaps I'll turn off your filter and see if it works... Hmm, nope. Turning yours off didn't help either. Confusing. I'll keep experimenting.

Thats odd. The script shouldn't interfere with Bayes running, especially if it's turned off! You need 1000 good and 1000 bad words before Bayes will kick in but it sounds like you have that. I'm not sure what the problem is but hopefully you find it easily.

The script is fixed for the IP problem but i won't update it here until tomorrow so i can test it more.

I saw earlier in this threat that Tribble was having the same problem getting both filters to run, but I added mine as a script from the beginning, so that's not the answer.

Restarting the application seems to have helped. I turned off your script and then restarted PocoMail -- the Bayes filter showed up in the header again. Then I turned yours on, and it appears that both worked:

I'm surprised it got a -121 since it was a direct copy of a known spam message, but it was sent from my own gmail account, so that probably did it. Anyway, there was no IP address for your filter to use, it appears, but other than that it worked, right?

It seems that the absence of an IP address would be the downfall of this filter -- I looked through some recent mail of mine, both spam and 'ham,' and found many of them had no IP, even those coming from friends with very normal ISPs. Any thoughts?

You have your Bayes set up to give a big score and it will drown out the result from this script so i'd recommend going to your Junk Mail Filtering screen then the Bayesian tab and changing the two sliders for the 'Junk Score' and 'Good Score'. How it is setup at the moment, Bayes ran on that message, decided it had 0% chance of being spam and gave it a -100 score. If the DNSBL test then came along and classed it as spam then it may only add 10 to that score so it will never reach the spam threshold and it stands no change of being classified as spam (unless you change the DNSBL scores to be much higher). I have my 'Junk Score' to be 15 and my 'Good Score' to be -5. Numbers in that sort of region should work better with the default scores in this script.

I'd also disable the 'Run standard non-Bayesian filters' option on the 'General Settings' tab because that -20 is also a high score. There's lots of room for experimenting on what works best for you with these scores but thats how i've set it up.

All emails you receive should have an IP address in the headers, in fact if they don't i'd be worried something isn't right. But not all IP addresses will be picked up by this script. When i wrote it i was quite tight on what it would and wouldn't accept as an IP address. If, after the next update to the script, the IP addresses still aren't showing up then please let me know and i'll take a closer look at it.

Updated to v1.11 (see first post). The default DNSBLs and scores have been updated (feel free to use whatever ones you like of course!). The IP address extraction routine is better. It may still need some work though so please post any problems.

Things seem to be coming together for me. I think having the Bayesian filter on '0' for non-spam may have contributed to it not showing up in the header even when it was working. That's why I tried the 'strict' settings, but now I'm using your -5 for non-spam and it all seems to be working. Here's a spam that just came through:

Both filters caught it, and either would have been enough. Thanks again for this great filter, and keep up the good work! I'll write again if I have any more big problems, but I really appreciate your help.

Just implemented your RBL filter today along with my many other filtering processes. Wow! What a great script. Thanks so much for all your efforts. (Also, in testing it found out MY dynamically assigned ip address from my DSL provider (Qwest) is blacklisted . . . the whole block of ip addresses surrounding me too!) Bummer

If your provider being listed is a problem then you could work aroud it by removing the DNSBL that tags it (unless they all tag it ) or modifying the filter to not run on email sent from you.

Thanks for posting the headers. Thats another bug with the IP extraction routine. I think i was too strict about what format the received header must be in (it doesn't like the 'untrusted sender' bit on the end of that one). I'll fix it and post here when its updated