SpamAssassin scores and 12-letter domains

As an owner of a 12-letter domain, and someone who is unable to post to any of the Apache mailing lists due to messages being rejected as SPAM (I'll be surprised if this one if any different), I have to ask, what is the rationale for the infamous 12-letter-domain-ding?

How many 12-letter domains exist? A few million? I can't think of a less useful metric, nor one that is more likely to yield false-positives.

On 8/5/12 1:48 PM, Benny Pedersen wrote: > Den 2012-08-05 19:13, Ben Johnson skrev: > >> There is hardly any published information on this subject, so perhaps >> one of the experts here will weigh-in. Apparently, I'm not the only one >> who feels this "feature" needs to die: > > X-ASF-Spam-Status: No, hits=4.8 required=10.0 > tests=FROM_12LTRDOM,SPF_HELO_PASS,SPF_PASS,URI_HEX > default is 5.0, not 10.0

______________________________________________________________________ This email has been scanned and certified safe by SpammerTrap(r). For Information please see http://www.spammertrap.com/______________________________________________________________________

On 8/6/2012 8:01 AM, Benny Pedersen wrote: > Den 2012-08-05 20:30, Michael Scheidell skrev: > >>> X-ASF-Spam-Status: No, hits=4.8 required=10.0 >>> tests=FROM_12LTRDOM,SPF_HELO_PASS,SPF_PASS,URI_HEX >> default is 5.0, not 10.0 > > why did ASF change it ?, did thay only change reguired ? :=) > >>> as you see there is long way to 10 >> .2 points to go to 5.0 > > irrelevant on ASF > >> score FROM_12LTRDOM 0.099 3.499 0.099 3.499 >> is a HUGE difference, any score over 2.75 points should be suspect. > > spamassassin is opensource, scores is not hardcoded > > i think what is more needed is just more comiters with ham and spam to > the public corpus scores is generated from, dont fight rules, change > scores if one is not comitter > > this rule does not hit ham here >

Thanks for the replies thus far.

Benny, it bears mention that not all of ASF's servers/mailing lists are configured the same way.

The Apache HTTP Server mailing list requires 5.0. My best guess is that they cranked-up the threshold for the SpamAssassin mailing list because, by nature, the discussion contains a lot of "spammy" content and false-positives were becoming a problem.

The fact that SpamAssassin is open-source is what's irrelevant; I have no control over how ASF configures its servers, and therefore no ability to disable the ridiculous 12-letter-domain check. ASF would have to change its configuration if my messages are to be accepted.

Better still would be to remove this "feature" from SpamAssassin altogether, as it is completely useless. That way, the problem would disappear as soon as ASF updates to a version of SpamAssassin in which the 12-letter-domain check is removed.

The fact is that nobody has articulated the rationale behind the 12-letter-domain check speaks for itself. If a rule is deemed to be useless, why is it not removed? It is wasting CPU cycles and affecting genuine ASF mailing list subscribers adversely (by rejecting their messages without basis). Further, it's not as though ASF's servers are the only ones using FROM_12LTRDOM; this ridiculous issue is affecting my ability to communicate across the Internet at large.

Given that ASF has no other public support channel, and no way to contact anybody to request that the filters be adjusted, what choice do I have beyond pushing to have the software modified?

> Better still would be to remove this "feature" from SpamAssassin > altogether, as it is completely useless. That way, the problem would > disappear as soon as ASF updates to a version of SpamAssassin in which > the 12-letter-domain check is removed.

When we get enough masscheck corpora to generate an update, the scores will go down to advisory. If sufficient masscheck corpora were available for regular score updates this issue would have been resolved a month ago.

> The fact is that nobody has articulated the rationale behind the > 12-letter-domain check speaks for itself.

Someone observed a noticeable amount of spam coming from twelve-letter domain names, and the rule for such performs well against the current masscheck corpora.

> If a rule is deemed to be useless, why is it not removed?

The masscheck corpora size is hovering just below the threshold for generating updates. There are various reasons for this.

It's possible that fixing this rule is important enough to spur releasing a manual rules update. However, masscheck rule scores based on starved corpora may actually be _worse_ overall than the effects from this one rule. If we _do_ generate a manual rules update to fix just this rule, it should really be a manual update to fix just this rule, for example by manually altering 70_scores.cf in the update tarball to adjust just the scores for this rule.

> Thanks for the replies thus far. > > Benny, it bears mention that not all of ASF's servers/mailing lists > are > configured the same way.

fair

> The Apache HTTP Server mailing list requires 5.0. My best guess is > that > they cranked-up the threshold for the SpamAssassin mailing list > because, > by nature, the discussion contains a lot of "spammy" content and > false-positives were becoming a problem.

so senders can now get more help on maillist with spam, this is intended, but if still some samples is rejected one can pastebin problems or register an account to show problems with default rules, remember scores is regnerated so in praktical there is no static scores in default spamassassin, this is why i still say that the rule are ok, but the corpus miss more ham to that rule

> The fact that SpamAssassin is open-source is what's irrelevant; I > have > no control over how ASF configures its servers, and therefore no > ability > to disable the ridiculous 12-letter-domain check. ASF would have to > change its configuration if my messages are to be accepted.

do you really need to send so much spam to spamassassin lists ?

> Better still would be to remove this "feature" from SpamAssassin > altogether, as it is completely useless. That way, the problem would > disappear as soon as ASF updates to a version of SpamAssassin in > which > the 12-letter-domain check is removed.

arg i dont agree, removeing one rule does not make a fix

> The fact is that nobody has articulated the rationale behind the > 12-letter-domain check speaks for itself.

nope i dont think it does, it could very well thay dont have the problem

> If a rule is deemed to be > useless, why is it not removed?

it depends on who and where you send it, spamassassin maintainers dont control the whole world of spamsassin servers that use there software, same way as mta maintainers does not say why config one have in main.cf in postfix, thay only suggest defaults in postconf -d

On 8/6/2012 1:32 PM, Axb wrote: > On 08/06/2012 05:25 PM, Ben Johnson wrote: > >> Given that ASF has no other public support channel, and no way to >> contact anybody to request that the filters be adjusted, what choice do >> I have beyond pushing to have the software modified? > > bare in mind: SpamAssassin is a framework and VERY flexible. > It's aiming to be the global solution for spam filtering. > > The SpamAssassin project delivers a set of rules and scores. > > These may not fit all types fo traffic, globally - with minimal skills > you can modify the ruleset to work for your setup. >

Thanks, Axb.

Yes, I understand that SpamAssassin is very flexible. The problem I'm describing, however, is not with my SpamAssassin configuration (in which case I would simply adjust it); it is with Apache Software Foundation's configuration (over which I have no control).

I raised this issue because ASF's SpamAssassin configuration -- specifically, the 12-letter-domain check -- causes my messages to its various mailing lists to be rejected more often than not. This list is very forgiving in that the required score is 10.0, but other ASF lists require only 5.0.

All of that said, it sounds like this issue will be discussed among the developers, so maybe something will be done and not all 12-letter-domain owners will be blacklisted throughout the Internet.

On 06/08/2012 19:00, Ben Johnson wrote: > All of that said, it sounds like this issue will be discussed among > the developers, so maybe something will be done and not all > 12-letter-domain owners will be blacklisted throughout the Internet. > Best regards, -Ben At risk of being slight unnervingly pedantic...

Spamassassin does not blacklist, spamassassin does not block. All spamassassin does (on its own) is classify. Certain implementations, in this case, the ASF's, will block emails that have been classified to a certain confidence level.

The default scores are not manually set, but (in my understanding) are evaluated against databases of spam and non-spam emails and statistically evaluated to optimum scores. There is long considered an issue of the amount of non-spam emails contained within this evaluation (getting 'typical' private emails for this purpose can pose certain difficulties).

The dev's can discuss the bigger issues, but for your immediate problem I would approach and continue to approach ASF on the actions of their spam block according to spamassassin's classification, blocking at 5 points is un-necessarily harsh, and many implementations have the ability to block at (say) 12 points, tag at say 5 points (which could be implemented to leave messages in a moderation queue) and release everything else.

On 8/7/2012 9:48 AM, Giles Coochey wrote: > On 06/08/2012 19:00, Ben Johnson wrote: >> All of that said, it sounds like this issue will be discussed among >> the developers, so maybe something will be done and not all >> 12-letter-domain owners will be blacklisted throughout the Internet. >> Best regards, -Ben > At risk of being slight unnervingly pedantic... > > Spamassassin does not blacklist, spamassassin does not block. All > spamassassin does (on its own) is classify. Certain implementations, > in this case, the ASF's, will block emails that have been classified > to a certain confidence level. > > The default scores are not manually set, but (in my understanding) are > evaluated against databases of spam and non-spam emails and > statistically evaluated to optimum scores. There is long considered an > issue of the amount of non-spam emails contained within this > evaluation (getting 'typical' private emails for this purpose can pose > certain difficulties). > > The dev's can discuss the bigger issues, but for your immediate > problem I would approach and continue to approach ASF on the actions > of their spam block according to spamassassin's classification, > blocking at 5 points is un-necessarily harsh, and many implementations > have the ability to block at (say) 12 points, tag at say 5 points > (which could be implemented to leave messages in a moderation queue) > and release everything else. > >