I looked at my junk ini which at this point has 7993 words. Out of those 6248 occur 2 or less times. This means that 78% of the content of those files is not being used to determine if a piece of mail is junk. If popfile does not insist on words occurring x amount of times I bet the culprit for training time and dictionary size needing to get so big is here.

Is there any variable in the poco.ini file I can change to tell it to start using words if they only occur 1 time?

I think that w/ good mail bias set to "3", words with a count of "2" will be included because 2 x3 => 5. Words with a count of 1 would be included if you had your good mail bias set to 5, since 1 x 5 => 5. I did have a correspondence w/Slaven (which I can't find) and think this is how it works re: whether or not words are inlcuded. Maybe somebody at PSI could confirm. No setting other than this that I am aware of that would affect whether a word has to occur once or more than once to be consdered.

Finally, one other point of difference b/w Poco and POPFile . . . Poco evaluates only the "top 30" probability words in an email based on their appearance in the corpus: POPFile includes all words. Hard to say how much of a difference this would make.

I would think that good bias affects how it treats good words, not bad words. If it works the way you say it might be something to try. I wish we could get somebody in the know to participate in this thread.

The problem that I have is that I receive just enough spam to be annoying but not enough to effectively train PocoMail. I've been using PocoMail's BF for more than two months now. I have about 3,000 good words and 18,000 bad words.

On a daily basis, I receive about ten spams. Of those, about three of them automatically go into the Junk folder, but I have to manually classify and move the other seven. So for me, at this point, it would be easier and simpler to just manually delete the spam in my In box than to use PocoMail's Bayesian Filter. I hope that future versions of PocoMail's BF will be more useful.

Yeah, that's kind of like teaching a kid to read by sending them to school one day a week . . . you'll eventually get there, but it might be just in time for the prom. Clearly you've been too parsimonious with your email address!