Talk:Spam blacklist/Archives/2008-09

Please do not post any new comments on this page. This is a discussion archive first created in September 2008, although the comments contained were likely posted before and after this date. See current discussion or the archives index.

I have not yet blocked the IP because I fear they will use multiple ones I already added the site to the bl. Maybe it should be removed later again, thanks, --birdy geimfyglið(:> )=|∇ 13:53, 28 August 2008 (UTC)

Additional seoloji.org spam

I looked at the spam domains above a bit closer using the WhosOnMyServer tool. Several domains were hosted by commercial web-hosting services and their servers contain hundreds of unrelated domains. I did, however, identify one German server, 89.149.226.124, with a cluster of the spam domains above plus a few other Turkish domains. About half of those domains turned out to be related to the domains I reported above and several had also been spammed:

See [3] and [4], a sockfarm of users all adding links to work by Dan Schneider, mostly from his website cosmoetica.com. 120 links cleaned from enWP and a number from other wikis (I'm on it but it's slow as I have to cross-check who added them). Where the links are added by anons, it is a stable subnet. Definitely a candidate for blacklisting on enWP and probably a candidate for meta blacklisting due to cross-wiki issues, albeit fairly limited by comparison wiht the extensive enWP abuse. JzG 21:54, 1 September 2008 (UTC)

Is this really something we want to be linking to regardless of who added it? Furthermore, it seems from the links you've provided that several domains are involved. Has blacklisting been discussed on enwiki yet? — Mike.lifeguard | @en.wb 18:22, 2 September 2008 (UTC)

Yes, it's now blacklisted on enWP (and one of Schneider's socks promptly requested whitelisting). The list of socks is up to about 40 now [5], but primarily an enWP issue. Still, there is some cross-wiki activity and the guy is very determined to use Wikipedia to make his the "The Most Widely Read Interview Series In Internet History!" - to which I cynically respond: {{fact}}. JzG 18:05, 4 September 2008 (UTC)

Redirect domain

Is/was it used on any wikipedia? -- seth 06:46, 15 September 2008 (UTC)

Added. I don't think there's a need for it to be used before blacklisting. It should never be used. I've added it to the bottom of the list. Do we still use # URL shorteners? Or any other section? --Erwin(85) 18:49, 15 September 2008 (UTC)

Well, if there's no need for links to be used before blacklisting, we could directly blacklist about 1 million websites (ad stuff, porn stuff, ...), what would probably become a problem of performance. -- seth 20:09, 15 September 2008 (UTC)

All url shorteners / redirectors are blacklisted by default, otherwise they can be used to bypass the blacklist. JzG 16:21, 22 September 2008 (UTC)

(Likely) misused is memurl.com (see COIBot report), useurl.us was used only once, the other three are not used as far as I can see (bot-downtime and before start I can't see .. yet). --Dirk BeetstraTC (en: U, T) 10:56, 23 September 2008 (UTC)

Links which are not used (for spamming) should not be blacklisted imho, because otherwise we could easily increase the number of blacklisted domains by thousands. This will would lead to nothing but a speed-down. -- seth 15:36, 23 September 2008 (UTC)

Seth, redirects and url shorteners are always added, uncontroversially, because otherwise they can be used to bypass the blacklist. If the blacklist gets too ig it won't be because of a few tens of redirect sites. Many of the ones on the list I linked are already linked anyway. And actually I think all of these had at least one link somewhere which was bypassing a blacklisted domain or linking a domain which should have been linked direct. JzG 22:42, 23 September 2008 (UTC)

We (c/sh)ould set up a list of these, as it would be nice to know when they get abused (as with the blacklist, also my bots would run into problems keeping an eye on all of them .. I am slowly running into that limit with them). I do believe that these should NEVER be used (as opposed to porn/other commercial stuff, where, if notable (now or in the future), there may be proper use of it. --Dirk BeetstraTC (en: U, T) 15:55, 23 September 2008 (UTC)

Is dynamic address from Italy, tomorrow 151.30.24.255 replaced all the linkspam I removed - see hereMoiraMoira 08:08, 20 September 2008 (UTC)

I many IP ranges (Matucana (49), 80.200.248.206 (40), 80.200.248.203 (21), 80.200.248.201 (21), 80.200.248.207 (20), 80.200.248.200 (12), 151.30.30.144 (11), 80.200.248.204 (9), 151.30.24.255 (8), 84.253.141.209 (6)) and one account. This does not look good. All to the page of Silvio Berlusconi, and seen that there are many reports per language, nowhere wanted.

There may be useful content here as well, but as the majority of the links is 'pushed' to Silvio Berlusconi, I'd keep it here until it gets questioned .. --Dirk BeetstraTC (en: U, T) 08:50, 20 September 2008 (UTC)

This is a commentary and opinion website, not based on any sort of research, but is basically a resource for people looking up urban legends. Although the site owners have a long history of supposedly investigating the claims of hearsay, jokes, and legends, it is largely driven by what amounts to yet another unverified source. The reason for the blacklist is that at first glance the site appears like an authority on any number of late-breaking legends, where in reality it is just a veiled opinion of the author(s) on whatever the topic might be. In short, the site appears to be an encyclopedia of urban legends, but it is in fact a mixture of comedy, opinion, hearsay, and legend itself. This puts it in the same category as a number of self-published blogs. Uruiamme 21:17, 20 September 2008 (UTC)

It's hardly an unreliable source. The site owners use other sources for their work, that's normally listed at the end of an article. Has this been spammed anywhere? I don't believe the owners make a profit from the site, and it only runs adverts to keep it going. Stuff should only be put on this list if it's actually spammed. Majorlytalk 21:21, 20 September 2008 (UTC)

Declined, no evidence of spamming, stated reason is out of scope for this blacklist absent evidence of abuse. That and the inconvenient fact that Snopes is probably the most widely trusted urban legend reference. JzG 11:37, 21 September 2008 (UTC)

I hardly thought that this would be given such a cursory look. I know that the snopes.com people are reputable, but my point was that they are neither peer reviewed nor unbiased. The people who run many blogs are reputable, so that seems hardly much of a positive. But that is not my main contention. The main issue is that there is at least one area of self-published content available on the forums there, which surely you aren't implying has the same reputation as the portion full of site-owner content? In other words, it is rife with the typical forum/blogging things, and it does have its own sub domain. I assumed someone might independently discover that. Uruiamme 05:15, 22 September 2008 (UTC)

Please read the header above. This list is for controlling abusive linking of websites, not to enforce one side's view in a dispute over the reliability of a certain source. Feel free to bring this up on the talk pages for the articles where you believe the link is being incorrectly used. JzG 12:44, 22 September 2008 (UTC)

Merlin Wikia

On the English Wikipedia site an IP user 82.42.175.146, has been adding spam links to www.merlin.wikia.com I propose that this link get's blacklisted as the IP user had posted it to several user talkpages, which goes against the Wikipedia Policies. DarkMage 18:25, 21 September 2008 (UTC)

If this is a problem only on en.wikipedia it should be dealt with on the local blacklist en:MediaWiki talk:Spam-blacklist. This blacklist is for spam across many wikis.

Any Wikia wiki can be linked to with interwikis, so a blacklisting of merlin.wikia.com will be very easy to outflank. --Jorunn 09:11, 22 September 2008 (UTC)

I actually don't see any linkadditions to merlin.wikia.com/by 82.42.175.146 (so maybe it is already used as an interwiki?). --Dirk BeetstraTC (en: U, T) 10:36, 23 September 2008 (UTC)

This is a malicious link added to en.Wikipedia.[7] It doesn't seem to harbor a virus but it's semi-pornographic images(?- hard to tell I didn't look for long!) and the code resizes your browser window and makes it bounce around the screen. Only added once on en.wikipedia as far as I can tell but seems to have no legitimate use and ought to be cross-wiki blacklisted as other malicious sites are. -- SiobhanHansa 21:53, 23 September 2008 (UTC)

Proposed removals

This section is for archiving proposals that a website be unlisted.

Lluisllach.pl

lluisllach.pl is a fine site, referring to a Geocities page. No spam, no porn. There are many pages about Lluis Llach, and the link was accepted by the Polish one. Bloking really does not seem necessary. —The preceding unsigned comment was added by212.39.28.26 (talk • contribs) 12:17, 16 Aug 2008 (UTC)

The site (as you have spelt it) does not appear to be blacklisted here. Thanks --Herbytalk thyme 12:23, 16 August 2008 (UTC)

the sbl is case-insensitive, the entry is

\blluisllach\.pl\b

for a given url you can use [8] (beta state) to find the corresponding entries. -- seth 13:53, 16 August 2008 (UTC)

Thanks seth - that way it is here because of this report. It was reverted, links placed again so listed. Looks valid to me. For anyone who doesn't look at it the appeal is by the Ip that was responsible for the link placement. Cheers --Herbytalk thyme 13:55, 16 August 2008 (UTC)

I think we should decline - geocities pages are of no use to the encyclopaedia. Rather the reverse, in fact. 80.176.82.42 23:07, 16 August 2008 (UTC)

i was about to use http://ezinearticles.com/?MMORPG-Crafting-Skills&id=1383381 as reference for an article, but its blacklisted - is there any special reason? --62.99.197.106 21:22, 28 August 2008 (UTC)

The reason is here, though I couldn't find the conclusion of that discussion quickly (and the log entry doesn't specify an oldid :\ not sure how that happened). — Mike.lifeguard | @en.wb 02:24, 29 August 2008 (UTC)

OK, the full discussion is archived. Given the self-published nature of that domain, and the issues with POV-pushing over a long period of time, I am happy to have this remain on the global blacklist rather than enwiki's local list. You may choose to request whitelisting for a specific use at w:MediaWiki talk:Spam-whitelist. Declined based on the original report. — Mike.lifeguard | talk 23:15, 6 September 2008 (UTC)

For the record, this was cross-wiki spammed. For example (this is just a small sample):

x.y.z.info

Concerning regexp [0-9]+\.[-\w\d]+\.info/?[-\w\d]+[0-9]+[-\w\d]*\].
A few days ago I removed this entry, but was told afterwards, that every removing needs a de-list discussion. So here I go ([#double/wrong entries|again]).Short: This entry never worked and does not seem to be needed, so imho it's the best to remove the entry.Long: In the beginning of 2006 there had been this request, which was added immediately. It was modified some time later. But all versions of the entry never matched anything, because the spamblock extension does not work on link descriptions, but only on the link itself. So there will never be a match on whitespace or square brackets.
Now there are 2 possibilities: 1. fix the regexp or 2. remove it permanently.
The original request said that the urls were something like (integer number).(letter).(name).info (which could perhaps be translated into \d+\.[a-z]\.\w+\.info). But if one looks at the present sbl, one can't see even one entry like this. So probably there's no need to block those domains anylonger. The only possibiliy is that entries like "cinn\.info" and "ephraim\.info" are of this format but were inserted without third-level domains. However, a short look into the history of the sbl discussion does not verify that.
Altogether I suggest to leave the entry removed. -- seth 09:59, 7 September 2008 (UTC)

Hello Erwin, I would like to know the reason of Caracal info european site being listed on spamlist/blacklist and the removal of the link of the italian article. I am the author of all Wikipedia articles in 16 languages related to the first pistol made in United Arab Emirates known as Caracal pistol and I regularly post the latest news on Caracal-pistol.info website to keep readers informed of the latest developments since day one. Sincerely Edmond HUET Quickload 09:55, 6 September 2008 (UTC)

Hi, as far as I can see there's only a small amount of information available on the web site. Most links point to your Domains for sale section. That and adding it to multiple wiki's caused me to blacklist it. Feel free to request removal from the blacklist at Talk:Spam blacklist. --Erwin(85) 11:38, 6 September 2008 (UTC)

caracal-pistol.info however is. Given your conflict of interest in this case, the cross-wiki additions and our norm of declining de-listing requests from site owners, this request is Declined. — Mike.lifeguard | talk 16:30, 7 September 2008 (UTC)

OK, given the related domains, and additions by Quickload, I think this may have turned into a request for listing. I'm normally not a fan of listing related domains, but Quickload seems to have a COI here, and is adding sites cross-wiki. Looking for input here. Related domains listed below. — Mike.lifeguard | talk 16:35, 7 September 2008 (UTC)

I see Quickload is a high-volume contributor; I suggest just sticking with the domain that's already blacklisted as long as Quickload does not add any more links to his own websites (it's a conflict of interest). Quickload, this applies as well to using anonymous IPs and alternate accounts. --A. B.(talk) 14:59, 10 September 2008 (UTC)

This domain was blacklisted per the XWiki report. Linking excessively across many wikis is inappropriate regardless of whether you are profiting from doing so or not.

Typically, we do not remove domains from the spam blacklist in response to site-owners' requests. Instead, we de-blacklist sites when trusted, high-volume editors request the use of blacklisted links because of their value in support of our projects. If such an editor asks to use your links, I'm sure the request will be carefully considered and your domain may well be removed.

Until such time, this request is Declined.— Mike.lifeguard | talk 16:50, 10 September 2008 (UTC)

Troubleshooting and problems

This section is for archiving Troubleshooting and problems.

double/wrong entries

when i deleted some entries from the german sbl, which are already listed in the meta sbl, i saw that there are many double entries in the meta sbl, e.g., search for

Some of the dupes will be left for clarity's sake. When regexes are part of the same request they can be safely consolidated (I do this whenever I find them), but when they are not, it would be confusing to do so, in many cases. Perhaps merging regexes in a way that is sure to be clear in the future is something worth discussing, but I can think of no good way of doing so. — Mike.lifeguard | @en.wb 02:06, 13 July 2008 (UTC)

in de-SBL we try to cope with that only in our log-file [9]. there one can find all necessary information about every white-, de-white-, black- and de-blacklisting. the sbl itself is just a regexp-speed-optimized list for the extension without any claim of being chronologically arranged.

i guess, that the size of the blacklist will remain increasing in future, so a speed-optimazation perhaps will be necessary in future. btw. has anyone ever made any benchmarks of this extension? i merely know that once there had been implemented a buffering.

oh, and if one wants to correct further regexps: just search by regexps (e.g. by vim) for /\\[^.b\/+?]/ manually and delete needless backslashes, e.g. \- \~ \= \:. apart from that the brackets in single-char-classes like [\w] are needless too. "\s" will never match. -- seth 11:36, 13 July 2008 (UTC)

fine-tuning: [1234] is much faster in processing than (1|2|3|4); and (?:foo|bar|baz) is faster than (foo|bar|baz). -- seth 18:21, 13 July 2008 (UTC)

I benchmarked it, (a|b|c) and [abc] had difference performance. Same with the latter case — VasilievV2 21:02, 14 July 2008 (UTC)

So should we be making those changes? (ie was it of net benefit to performance?) — Mike.lifeguard | @en.wb 21:56, 15 July 2008 (UTC)

these differences result from the regexp-implementation. but what i ment with benchmarking is the following: how much does the length of the blacklist cost (measured in time)? i don't know, how fast the wp-servers are. however, i benchmarked it now on my present but old computer (about 300-500MHz):

if i have one simple url like http://www.example.org/ and let the ~6400 entries of the present meta-blacklist match against this url, it takes about 0,15 seconds till all regexps are done. and i measured really only the pure matching:

before i start modifying the list, a want to know, whether i should log my changes somewhere. oh, and btw. i suppose that the entry [0-9]+\.[-\w\d]+\.info\/?[-\w\d]+[0-9]+[-\w\d]*\] is somehow senseless, for it will probably never match. i found the original discussion [10] (the regexp was changed afterwards), but the regexp will not grep the links mentioned there. shall i just delete such an entry or shall a make a new request and try to correct it? -- seth 09:18, 13 August 2008 (UTC)

It would be nice if you could update the log as well, so we can still find the corresponding log message. Though maybe we should wait and see if anything new comes out of #The Logs. I guess it's best to correct wrong entries or in any case log all those removals. It probably wouldn't hurt if some were removed, but I have no idea how many entries we're talking about. --Erwin(85) 09:31, 13 August 2008 (UTC)

ok, so i'll wait until the other thread is finished. but i don't think, that a manipulating of the logs is a good idea, because this will make tracing of entry changes difficult.

i guess, there are less than 10, perhaps even less than 5 useless entries. -- seth 10:29, 13 August 2008 (UTC)

i cleaned up the sbl two days ago. until now i did not delete any entries (except for grouping purposes). and i could not correct the entry "\bnstpi\.com\.my/ client" (with a senseless space) because its diff wasn't very meaningful. perhaps somebody knows something about this entry and could tell it.

however, one question is: shall i really modify the wrong entries in the logs, too? it is like changing history, so it could cause irritations. -- seth 08:48, 26 August 2008 (UTC)

http://meta.wikimedia.org/w/index.php?title=Spam_blacklist&diff=1147562&oldid=1146869 which added the question marks, also blocked legitimate sites. For example chabad(east|usa|world)\.(am|com|org) and chabad\.am became chabad(?:east|usa|world)?\.(?:am|com|org) which blocked legitimate such as chabad.com and chabad.org. A solution may be to remove the question marks for this entry and restore it to 2 entries like it was before. --PinchasC 14:20, 1 September 2008 (UTC)

Done - Regex is now chabad(?:east|usa|world)\.(?:am|com|org), and should block what it's supposed to now. — Mike.lifeguard | @en.wb 15:39, 1 September 2008 (UTC)

"\bnstpi\.com\.my/ client": after looking at the request on the TP and the links mentioned there, i suppose, that the leading " client" could just be ignored, so i deleted it. otherwise the regexp would be totally useless. -- seth 10:13, 7 September 2008 (UTC)

what does "let's not use ?: - it makes COIBot unhappy[...]"[11] mean precisely? -- seth 23:55, 27 August 2008 (UTC)

Beetstra can tell you exactly, as he is the bot's owner. I believe it choked on that as it isn't handled properly in Perl. Also some of the very long regexes caused issues (but didn't change those). I am having second thoughts about consolidating regexes which are not part of the same request. Regexes added together can be mushed together easily, but those in separate requests should likely stay separate, I think. Not sure what to do next about this though. — Mike.lifeguard | @en.wb 23:59, 27 August 2008 (UTC)

COIBot: well, perl could cope with non-capturing patterns /(?:foo)/ long before php even existed, so i guess it isn't really a perl-problem. i'll ask Beetstra on his talk page about that.

grouping: as far as i can see, the sbl-page can be used for blocking only. all relevant blocking information is listed in the log (and the links mentions there). so i don't see, how even a random sort on the sbl entries combined with randomly grouped regexps could harm. -- seth 01:49, 28 August 2008 (UTC)

double entries

I wrote a small script to grep most of the double (or multi) entries. The result is presented on User:Lustiger_seth/sbl_double_entries. As you can see, there are many (>250) redundant entries. I guess, we could delete more than 200 entries. -- seth 22:59, 19 August 2008 (UTC)

moved a discussion to previous thread. -- seth 08:26, 2 September 2008 (UTC)

So, as we now log removals too, I will delete double entries, if nobody raises objections. -- seth 19:17, 3 September 2008 (UTC)

done. some additional comments on deleted entries, which were not exactly double:

\.rr\.nu # deleted, although it is not fully superseded by \brr\.nu\b, but almost. i guess that the domain .nu was meant, so the postfix "\b" is ok.
caiquecrazy\.us\.tt # almost fully superseded by \bu[ks]\.tt\b
\.6url\.com # almost fully superseded by \b6url\.com\b
\.flingk\.com # almost fully superseded by \bflingk\.com\b
\.metamark\.net # almost fully superseded by \bmetamark\.net\b
\.paulding\.net # almost fully superseded by \bpaulding\.net\b
\.shorl\.com # almost fully superseded by \bshorl\.com\b
\.shortlinks\.co\.uk # almost fully superseded by \bshortlinks\.co\.uk\b
\.simurl\.com # almost fully superseded by \bsimurl\.com\b
\.smcurl\.com # almost fully superseded by \bsmcurl\.com\b
\.tighturl\.com # almost fully superseded by \btighturl\.com\b
\.yatuc\.com # almost fully superseded by \byatuc\.com\b
\.yep\.it # almost fully superseded by \byep\.it\b
\.ontheweb\.nu # almost fully superseded by \bontheweb\.nu\b
\.isgre\.at # almost fully superseded by \bisgre\.at\b
drugs\.isgre\.at # same as above
\.byinter\.net # almost fully superseded by \bbyinter\.net\b
drugs\.byinter\.net # same as above
nigeria\.tz4\.com # almost fully superseded by \btz4\.com\b
\binternet-history\.tz4\.com # same as above
\.edom\.co\.uk # almost fully superseded by \bedom\.co\.uk\b
\.fw\.nu # almost fully superseded by \bfw\.nu\b
\.redirect\.hm # almost fully superseded by \bredirect\.hm\b
drugs\.passingg\.as # almost fully superseded by \bpassingg\.as\b
\.shop\.tc # almost fully superseded by \b(?:au|es|hk|hu|ie|it|kr|mx|pl|se|th|ua|us|shop)\.tc\b
\.explode\.to # almost fully superseded by \bexplode\.to\b
\.zwap\.to # almost fully superseded by \bzwap\.to\b
squidoo\.com/inexpensive-wine # almost fully superseded by \bsquidoo\.com\b
squidoo\.com/localphoneservice # same as above
\bsearchtravel\.biz/countrylist/italy.php # almost the same as \bsearchtravel\.biz/countrylist/italy\.php\b
drugs\.lowestprices\.at # almost fully superseded by \blowestprices\.at\b

I see three requests, all three answered, and none of them suitable for whitelisting (as it needs whitelisting on the local projects, if anywhere). Are you sure you link to the right page? --Dirk BeetstraTC (en: U, T) 15:18, 8 September 2008 (UTC)

Thanks Comets - Added for now. In passing I see no harm in listing such sites as much to send a message to the user that their behaviour may not be appropriate. Not sure about how lasting teh listing should be our logging immediately - thoughts welcome. --Herbytalk thyme 12:12, 12 August 2008 (UTC)

Reviewing this it may well be a good faith de user who has just decided to expand there interests (based on SUL info). In which case I suggest serious consideration for de-listing if we are asked. --Herbytalk thyme 12:17, 12 August 2008 (UTC)

Adding the link to your userpage on many wikis where you are not a community member is generally frowned upon. I suggest you instead leave a link to your userpage on your home wiki if you need to create a userpage. If you are an established community member, you would be afforded more leeway with respect to user page content. I'm prepared to de-list this on the condition that the link is not added cross-wiki again. — Mike.lifeguard | @en.wb 14:20, 31 August 2008 (UTC)

Okay, I'm very sorry about that, it was never my intention to start link building for my website - I only wanted to show my person and what I do. It won't happen again. If I'm not really contributing something I don't create a userpage and if, I set a link to the German Wikipedia (where I'm contributing the most). Thank you for the answer and for elucidating me about this issue. It would be nice if you would enlist the URI, because I don't want my URI to be on a blacklist and I'm definitly not going to post the address again. Kindest regards --Fleshgrinder 09:48, 2 September 2008 (UTC)

Is most definitely a spammer who creates "SCM declassified" (www.51icjm.com link) on his userpage and talkpage so it can't be rollbacked :( ..thsi account did a similar thing (same pattern)--Cometstyles 03:24, 9 September 2008 (UTC)

Despite being blocked on Commons they are still spamming their user page. Protected now but I guess other projects will be affected. Cheers --Herbytalk thyme 11:11, 9 September 2008 (UTC)

The account is not global ;( I would suggest to add the link to the bl, best regards, --birdygeimfyglið(:> )=| 18:19, 9 September 2008 (UTC)

It is added, but they are spamming it in plaintext. This is why we want to give stewards the ability to forcibly merge accounts. By not unifying, spammers and vandals may continue unless we block them on each wiki individually. I will purge the userpages, but without some mechanism to enforce this, I do not see how we may force them to stop. — Mike.lifeguard | talk 18:24, 9 September 2008 (UTC)

Sorry, they are spamming a new domain, which I'm adding. — Mike.lifeguard | talk 18:30, 9 September 2008 (UTC)

Point was made here on Meta that this user is promoting self across multiple Wikimedia projects. I am tending to agree. Sincerely, -- De728631 10:28, 11 September 2008 (UTC)

This is a known user who is legitimately active on many wikis, hence the userpages. On most I don't see external links at all, nor do I see current cross-wiki self-promotional behaviour. The few links I have found are for attributional purposes, which is legitimate. — Mike.lifeguard | talk 10:43, 11 September 2008 (UTC)

Discussion § 1

I placed the same vita on my user page that I use on all the sites where I contribute work and discuss ideas with other interested parties. This does not constitute SPAM (= "unsolicited mass-mailing or posting") in any technical or COI sense of the word. I would appreciate the two variants of my real name that I use on the Internet and Web not being listed on any kind of badlists. Thank you, Jon Awbrey 19:12, 29 August 2008 (UTC)

While it may not be spam, it would seem to be abuse of WMF wikis & as such unwanted. While community members are given leeway with their userpages, such excessive linking is generally frowned upon. Furthermore, I very much doubt you understand all the languages you have posted this to, nor are you active in those wikis. I invite you to fix the problem before it is done for you. The history at enwiki will be of interest to others reviewing this. — Mike.lifeguard | @en.wb 19:36, 29 August 2008 (UTC)

I would appreciate it if you could point to the relevant WMF Terms of Service, or even a generally accepted standard of etiquette that would justify your calling this user page vita an "Abuse". I am referring to the one now posted here at Meta, which is a copy of the one deleted by Annabel from my Nederlands User Page. By "generally accepted standard of etiquette" I mean one that you could honestly assure me is followed across the board on all WMF User Pages. In addition, I have never seen any notice of Wikipedias being "Encyclopedias that anyone who is fluent in the local language can edit" — but please let me know if I have missed such a restriction somewhere. Jon Awbrey 20:22, 29 August 2008 (UTC)

You misunderstand me crucially. I do not say you need to be fluent in the languages where you contribute. To claim that would be hypocritical; I edit all WMF wikis. The issue is that:

You are not an established member of the community on any wiki where you have a userpage (so far as I can tell).

Your userpage has an excessive amount of links (indeed, links form the only content, and they appear to be placed for self-promotional purposes). This would perhaps be an issue regardless of the above.

[Undent]: Correct me if I am wrong, but I do not think it is customary for newcomers to any of the many-tongued Wikipædiæ to be subjected to the ordeals of this type of entrance exam with regard to the legitimacy of their participation. However, By FYIing my real name, educational background, and ongoing intellectual interests, I have certainly done more than the avarage Anon IP on that score.

Many people post pics on their user pages as a way of providing a friendly introduction to themselves, their current interests, and their personal histories. My old web vita harks back to a day when I was unsure about the propriety of copying pics, so I used links instead, over the years being forced to replace many of them with WayBak links. You can hardly dream that I am collecting revenue off archival links like that, can you?

If and when you personally discover an interest in some of the Active Suggestions Concerning Intellectual Interchange that I enumerated in my web vita — which was my sole purpose in posting it to my NL User Page — then we may find more interesting things to talk about. In the mean time, I can hardly become an "established member of the community on any wiki", much less learn a few bits of the local colour and language, if some Admin deletes my self-introductory user page and blocks my account after the first few edits, now can I? Jon Awbrey 23:45, 29 August 2008 (UTC)

Jon, this same sort of Wikilawyering nonsense is what got you banned from enWP and booted from the mailing list. Obviously your rampant sockpuppetry and disruption ensures you remain banned on enWP. I would be the first to help you if you wanted your massive list of socks associated with some other name, to reduce the impact on you, but I don't see why we should help you to pretend that you are here to do anything other than the usual: self-promotion and idiosyncratic original research. JzG 20:50, 4 September 2008 (UTC)

This is shameless self-promotion, and I would suggest that someone who has the necessary rights removes the pages from all projects on which he is not an active participant. JzG 11:44, 7 September 2008 (UTC)

So, the following links are the ones being used for vanity spamming here:

Discussion § 2

I would like to add a comment. As long as this page edited by this user multiple times on the English Wikipedia still exists, we look absolutely foolish trying to suppress a passive list of vitae links from a USER page, for heaven's sake. No surprise. Given the opportunity to choose two paths, Wikimedians will select the most backward, stupid-looking one. -- Thekohser 18:04, 8 September 2008 (UTC)

To my knowledge, Elonka has not edited her article in a long time; this was a big issue in her several RfAs and she's been severely criticized for this before. If I'm wrong and there remains an ongoing issue with coi edits, let me know. Thanks, --A. B.(talk) 18:58, 8 September 2008 (UTC)

In what way was that not trolling, Greg? JzG 17:00, 17 September 2008 (UTC)

Given the variety of links, and that several may well have legitimate uses, I'm going to remove the links. Pushing links is inappropriate regardless of the namespace. — Mike.lifeguard | talk 19:09, 8 September 2008 (UTC)

Well, I was pointedly reverted on English Wikiversity. I did attempt an explanation in irc, but that was equally-pointedly rebuffed. Relevant on-wiki discussion is on English Wikibooks. Perhaps someone else would take that on. — Mike.lifeguard | talk 00:05, 9 September 2008 (UTC)

Comment: I see JonAwbrey has reverted quite a number of linkremovals on userpages. --Dirk BeetstraTC (en: U, T) 13:57, 9 September 2008 (UTC)

More userpages on enwikquote, eswiki, fiwiki, kowiki, ruwiki on top of the reverts. — Mike.lifeguard | talk 14:38, 9 September 2008 (UTC)

See also a discussion between Moulton and Jon Awbrey on Wikiversity, which seems to contain a threat. — Mike.lifeguard | talk 14:24, 9 September 2008 (UTC)

[Undent] Mr. Lifeguard, given your acknowledgement that the material in question is not "SPAM", I think that further discussion on this so-called "spam blacklist" page is no longer relevant. So I would like to request, once again, that you remove the listing of my usual Internet names from this page. Thanks in advance, JonAwbrey 17:03, 9 September 2008 (UTC)

It may or may not be spam. To me it certainly is abuse of the facilities that are enjoyed by users provided by the Foundation. Your contributions to many projects are zero other than your overlinked user page. --Herbytalk thyme 18:02, 9 September 2008 (UTC)

Your statements are incorrect. Since you appear genuinely interested, I can give you a list of contributions to several projects that may not show up in your cursory scans. For instance, you are probably missing the contributions that come by way of interwiki translations of articles that I wrote for the English Wikipedia. These contributions are, in my humble opinion quite substantial. Indeed, it was in following the search engine traces of these translations that I was brought to many of those non-anglophone Wikipedias. JonAwbrey 18:26, 9 September 2008 (UTC)

As for the rest, surely you must have some sense of how silly it would sound to say that a person cannot be allowed to contribute unless he or she is already an established contributor? Surely? JonAwbrey 18:26, 9 September 2008 (UTC)

Fortunately you are entitled to your opinion & I to mine. Wikis are about collaborative working with consensus among folk - your view would seem at odds with some others and not to be particularly collaborative in their approach. Personally I'm inclined to consider blacklisting the links as I see the excessive linkage to be outside the scope of most projects.

There is nothing silly about suggesting that someone whose only contribution to a project is a personal page which is out of scope is not effectively contributing. I delete many such pages most days. --Herbytalk thyme 18:56, 9 September 2008 (UTC)

Discussion § 3

The page you provide, with all the links, is IMHO mainly there as a linkfarm. It does not tell about you, what expertise you have, no, it only lists external links to your other identities. As such, it is more promotional (especially since all these pages will show up in e.g. Google searches (here). If you translate things to English, then it is not needed to have a userpage on another language, that userpage is only useful if you actually contribute there). As you create the same userpage with all such links everywhere, a single link to one single 'main' userpage would suffice, this serves no purpose and also I regard this as a misuse of facilities provided by the Foundation (except where local encourage such linking, which, if I see it correctly, is only true on Wikiversity). --Dirk BeetstraTC (en: U, T) 09:37, 10 September 2008 (UTC)

Thinking this through I think the idea that this user should be the sole determinant of both their user page content, and what is on this page, is plain wrong. If they insist on having these links on their user pages then I think it completely correct that this section remains here for the community to consider the position, & this may lead to blacklisting of the links. In passing I also note that they are now blocked on en wp having exhausted the community's patience. I suggest that we may need to consider this view on other wikis too. Thanks --Herbytalk thyme 11:31, 10 September 2008 (UTC)

[Undent] Let's back up, slow down a bit, and let me see if I can figure out what the handful of people who are commenting on this page are really concerned about. If you want to move the discussion to my meta talk page that might be nice, as I keep getting warning messages from my browser about "non-responsive scripts" on this page that are really bogging down my ability to read and edit it. JonAwbrey 11:44, 10 September 2008 (UTC)

I think the problem is here, that the user made a linkpage in his usernamespace not only where he is active, but spread it on many wikis, I doubt anyone would have said anything if he had it on the 2 or 3 wikis he is active. But adding it to many wikis and doing that only there, sorry, is spam, not the links itselves, but the mass adding.

I hope that clarifies the problem. I regret that the user got blocked on some wikis for that already, I believe he should just replace the userpage on the other wikis with a link to his main userpage, that would be sufficient if anyone really wants to look at his userpage and request unblock on the wikis where he got blocked.

I will have to be doing some other work for a while. I am breaking this into sections for the sake of my browser and so I don't fall too far behind. I'm still not clear why anyone would refer to my standard self-introduction to a language-based or project-based wiki as a "self-promotion" in the COI sense of the word, much less a "linkfarm". I was led to most of those web sites because my name was already mentioned there in connection with some English Wikipedia article or other page that was refernced or translated there. As far as being a "link farm", I just don't get that at all. I refer people to sites and papers that I am currently working on, as do many other people in all of the wikis that I have seen. I was given to understand that WMF uses "no follow" tags, so no bots follows those links. As I have explained a couple of times before, I have used that same vita for many years as a standard self-introduction. Many people illustrate their user pages with MegaByte animations, graphics, and pics — I have always preferred to use simple links to pictures instead, partly for byteage and partly for copyright reasons. That's what the Web is for, remember? Many of these pages and pics are so old now that they can only be found in the WebArchive. I am certainly not getting any promotional considerations for any of them. JonAwbrey 19:24, 10 September 2008 (UTC)

Please see the note on my talk page. Both my browsers keep jamming up on this page, so this will have to be my last posting here. JonAwbrey 00:04, 11 September 2008 (UTC)

Given that there is a strong consensus here, I see no need for further discussion in any case. The links will be blacklisted globally if you revert again, so please do not do so. — Mike.lifeguard | talk 10:46, 11 September 2008 (UTC)

I agree to that, Jon, please You seem to be a reasonable person, don't readd these links on the pages where You are not active, make a link to Your home wiki, or add babel information etc..

The Web is about many things btw. and there are different sites with different purposes, MySpace, Geocities etc. You can use for making a personal webpage. The userpage on WMF-projects not.

I am still hoping this can be solved without needing to blacklist anything and I hope You understand what I wrote yesterday. It is not the links that are the problems itselves, it is the mass adding of them to multiple sites.

So, one more cleanout and if he reverts on projects where he is not demonstrably active, then they get blacklisted. That seems entirely reasonable to me. More than reasonable given Awbrey's offsite solicitation over this. JzG 17:04, 17 September 2008 (UTC)

Discussion

This section is for archiving Discussions.

The Logs

log system

I would like to consolidate our logs into one system which uses subpages and transclusions to make things easy. Each month would get a subpage, which is then transcluded onto Spam blacklist/Log so they can easily be searched. This would mean merging Nakons "log entries" into the main log, and including the pre-2008 log. This wouldn't require much change in how we log things.

However, I wonder what people think about also logging removals and/or changes to the regexes. Currently, we don't keep track of those in any systematic way, but I think we should. For example, I consolidated a few regexes a while back, and simply made the old log entries match the new regexes, which is rather Orwellian. Similarly, we simply remove log entries when we remove domains - nothing is added to the log, so we cannot track this easily. This idea (changing the way we log things) is likely going to require some discussion; I don't think there should be any problem moving to transcluded subpages immediately.

I'm all for using one system for the logs. I'm not sure about your second idea though. Is the log intended purely to explain the current entries or also former entries and perhaps even edits? Logging removals would be a good idea to see if a domain was once listed, but logging changes seems too bureaucratic. Matching the log entries with the new regexes might be Orwellian, but it's also pragmatic. What are the advantages of logging changes? Could you perhaps give an example of how you suggest to log changes? --Erwin(85) 18:16, 6 August 2008 (UTC)

I should say I mean "Orwellian" without the connotative value. The denotative value is simply that the current method is "changing history" - not in and of itself a bad thing. Indeed, I've had no issues with this, hence the speculative nature of that part of my suggestion. — Mike.lifeguard | @en.wb 19:48, 6 August 2008 (UTC)

in de:WP:SBL we do log all new entries, removals and changes on black- and whitelists. logging changes can be useful e.g. for retracing old discussions. -- seth 01:35, 7 August 2008 (UTC)

i think, that the transclusions are a good idea to keep the traffic low. is anybody against that?

Please use subpages; I changed your examples above. There is no global whitelist, no. But in the future? Perhaps something to request. I imagine leaving old logs will be fine. Are we sure we want to log changes to regexes? I'm not sure whether that's really necessary. It also raises the already-high bar to contributing in this area. Our procedures are opaque enough as it is - this is one more hoop we are making potential recruits to the anti-spam team jump through. — Mike.lifeguard | @en.wb 22:12, 23 August 2008 (UTC)

whitelist: i guess, a global whitelist would not be necessary, because blacklist entries usually can be modified by plain regexp-syntax to match all example.org except example.org/good. such a blacklist entry would be

example\.org(?!/good)

however, there may be cases, where a explicite whitelist entry would be better human-readable.

leaving old logs: if removals shall be logged, how shall they be logged? just by comment?

log changes: the main reasons why i am asking are #double.2Fwrong_entries and #double entries. if it was ok to remove bugs, syntax-optimizations and double entries without logging it, it would be less work for me. ;-) -- seth 22:50, 23 August 2008 (UTC)

I guess so, but if you see something suspicious please check if it can really be removed. Using the new syntax is OK with me. Logging removals like on dewiki looks good. --Erwin(85) 10:19, 24 August 2008 (UTC)

Thanks for taking care of the logs; I think that will work much better.

I'm not sure whether I'm happy with having regexes consolidated as you've done. Within each set of additions, one should try to be concise with your regexes, but I don't think merging all the blogspot ones together is necessarily a good idea. This will make future removals more difficult. In case you forget, not all are as proficient with regex as you, myself included! — Mike.lifeguard | @en.wb 02:14, 29 August 2008 (UTC)

first of all: i did not merge blogspot entries. the very long blogspot line had existed before my "big" edit. ;-)

merging all blogspot-links in one line would probably be not a good idea, because of performance reasons (the extension builds the regexps in 4k-blocks) and because of COIBot, which now allows a maximum line length of 1k chars.

(not to be misunderstood: grouping regexps increases performance, but lines >1k will lead to problems)

i grouped only a few regexps and only if they were "near" together in the SBL and had no different headings. as regexp grouping is used already, i didn't think that would be difficult to read. the largest grouping i did at the beginning and in lines 3300-3500, see [13]. was that too much?

concerning the logging: afaics we all want to log removals, too, don't we? but if i didn't get you wrong, you don't want to change the log-syntax. so i don't understand how you want SBL removals to be logged? :-) -- seth 09:43, 29 August 2008 (UTC)

My mistake on the blogspot one then. I've said nothing about not changing the log format - feel free to do so in order to log both additions and removals - the template you would want to change is {{sbl-log}} and the "snippet" at the top of this page. — Mike.lifeguard | @en.wb 18:01, 29 August 2008 (UTC)

But this is a bit more work for the admins and gives just a small additional information (the exact date of addition/removal), so I don't know whether this is really better. Although Erwin said, the dewiki-syntax looked good and Mike.lifeguard told me to feel free, I'm not sure, if any other admin will beat me, if I change the syntax to dewiki-style. :-) -- seth 11:21, 31 August 2008 (UTC)

However, i've been bold. Now we have same syntax as dewiki. -- seth 15:10, 2 September 2008 (UTC)

I adapted COIBot in the XWiki reports, it now (should) say(s) (have to wait for the next report from nowdiff):

with (hopefully) the first # at position 40 (may have miscalculated that). Replace SBL-diff by the diff and save. It is going to be more work, but well, it is also clearer from now what happens. --Dirk BeetstraTC (en: U, T) 15:33, 2 September 2008 (UTC)

Can we please keep the admin's name (and the span which was there previously)? Furthermore, when someone is using the log snippet at the top of this page, it will follow the old format. — Mike.lifeguard | @en.wb 16:59, 2 September 2008 (UTC)

I've added a snippet for logging on the actual blacklist. Take the snippet after you make an edit.

So, to log an addition, grab the snippet from this page and the snippet from the blacklist page.

For additions, use {{sbl-log|1161258#{{subst:anchorencode:Example}}}} {{sbl-diff|1161261}}

This should make it faster to log things, I think. — Mike.lifeguard | @en.wb 17:27, 2 September 2008 (UTC)

OK, changed it back .. we are not sure about this implementation yet (for me, it does give extra work, and IMHO does not add, a simple '+' or '-' in the logs without the actual difflink should suffice .. )? --Dirk BeetstraTC (en: U, T) 17:34, 2 September 2008 (UTC) (forgot to sign)

Actually the admins name is redundant, because it is included in the diff. If all (even redundant) information is provided (like now), it makes a lot of work for the admins.

The additional information provided by the difflink is quite small (i.e. exact modification date). The simple '+'/'-' (or 'b+'/'b-') would be enough. (That's why I was asking a few lines above.) The difflink would be fully superfluous, if all admins used the edit summary line of the sbl to inform about the added/removed entry explicitly, but that it unrealistic, I know.

So which syntax shall be used? Afaics its main features must be: 1. provide important information, 2. easy to input for admins, and 3. not too hard to read for machines. I guess all above suggestions will do, so it doesn't really make a big difference, which one will be chosen.

I guess, if nodody answers, we will just continue like now. -- seth 08:22, 3 September 2008 (UTC)

The admin's name isn't redundant - that is information we will want without having to look at the diff. By that logic, we would also not have whether it was an addition or removal, since that is information contained in the diff 0.o — Mike.lifeguard | @en.wb 14:20, 3 September 2008 (UTC)

The '+'/'-' is redundant, right. But it is a main information about the sbl modification. The admin's name is imho not so important. But we don't need to discuss about that small point. For me the current syntax is no problem. :-) -- seth 19:12, 3 September 2008 (UTC)

tool for log searching

The simpliest way to improve searchability is to write a tool that searches the logs for you. I'm in the middle of doing so, and I'll have a working prototype in a few days. The way this would work is it would load all the pages (really does not matter where the pages are), and apply a few regex to them. This means we really don't have to merge nacon's stuff, I can just add that page to the tool. As long as the logs keep the same pattern of one entry per line, a tool is not difficult.

I don't really think logging removals is smart, we never remove entries from the logs anyway. Simpliest way is to keep the logs write only (only new entries), and have a tool list all matches. (I'm writing the tool in a manner where you will be able to put the domain in "plain", as in google.com, and it will find all the relevant entries, even if it has \bgoogle\.com\b, or some other weirdness. —— nixeagle 20:23, 6 August 2008 (UTC)

lol, by accident i started writing a similar tool 2 hours ago. but i write a cli-perl-script only. until now it greps all sbl-entries (in meta-blacklist, de-blacklist and de-whitelist), which would match a given url. -- seth 01:35, 7 August 2008 (UTC)

Seth, nixeagle: actually, having a tool that searches all blacklists and logs (i.e. cross-wiki) to see if it is blacklisted somewhere, and if there is a log for that would be great. IMHO, it should be 'easy' to write a tool that extracts all regexes from the page, and tries if it is possitive against a certain url that we search (and it could then be incorporated into the {{linksummary}} to easily find it ..). Or is this just what you guys are working on ;-) .. --Dirk BeetstraTC (en: U, T) 09:50, 7 August 2008 (UTC)

one question, can you make it add 'http://' by itself (as we only put the domain in the linksummary as to prevent the blacklist to block it ..). --Dirk BeetstraTC (en: U, T) 14:48, 7 August 2008 (UTC)

Thats about what I was writing. I was putting it in the framework of http://toolserver.org/~eagle/spamArchiveSearch.php where the tool retrieves the section/page and links you directly to where the item was mentioned. For logs I was working on displaying the line entry in the log as one of the results, so you would not even have to view the log page. —— nixeagle 15:11, 7 August 2008 (UTC)

if you want to combine my script with that framework, i can give you the source code. but it is perl-code and it is ugly, about 110 lines. -- seth 17:00, 7 August 2008 (UTC)

i had to cope with a bug in en-sbl. but now it seems to work. further suggestions? (the more lists i include, the slower the script will get.)-- seth 16:44, 7 August 2008 (UTC)

I would suggest to do it progressive, first meta and en blacklist, the rest later (roughly in order of wiki-size), similar to luxo does. --Dirk BeetstraTC (en: U, T) 17:07, 7 August 2008 (UTC)

i used a hash, and those don't care about the order of declaration. now it should be sorted. -- seth 22:11, 7 August 2008 (UTC)

User page advertising

Another "thinking aloud" one!

I guess I come across a commercial orietated user page on Commons once a day on average. The past week has bought a "Buying cars" page, an "Insurance sales" page, a "Pool supplies" page as well as blog/software/marketing pages. I do usually run vvv's SUL tool but quite often there is nothing immediatly (the Pool suplies one cropped up on en wp a couple of days after Commons). I know en wp are often reluctant to delete such pages out of hand (which I find incredible).

I think I am probably saying should we open up a section here to allow others to watch/comment/block/delete or whatever across wikis? --Herbytalk thyme 09:51, 10 August 2008 (UTC)

I agree, this is a great idea, as I have also noticed spammers like this go cross-wiki to multiple projects (Wikinews/Commons, etc.) Cirt 11:40, 10 August 2008 (UTC)

Agree. Others may be interested in watching only that part of our work - perhaps a transcluded subpage so it may be watched separately? — Mike.lifeguard | @en.wb 14:03, 10 August 2008 (UTC)

I can't instantly find the insurance sales one & I am sure another user produced a page the same as Theamazingsystem. We could do with working out the best way of presenting the info - whether the standard template is needed or whether just an SUL link would allow us a quick check on cross wiki activity?

It would be good to know if the COI bot excludes User: space and whether that may need rethinking?

So far as I know, it watches only the mainspace. But Beetstra above said this could be changed. — Mike.lifeguard | @en.wb 14:40, 10 August 2008 (UTC)

Not sure what else is in the works but I think an SUL link to check activity cross-projects would be sufficient. Anything else would be above and beyond but would also be nice. Cirt 15:06, 10 August 2008 (UTC)

The standard {{ipsummary}} template is pretty good but (I think) lacks the SUL link which for this kind of stuff would be useful (luxo would be a help tho I guess).

The other thing I guess would be to get agreement to lock the blatantly commercial accounts just so that they do not do a "JackPotte" on us I think. I'll maybe point a couple of people to this section. --Herbytalk thyme 16:04, 10 August 2008 (UTC)

As it happens I was just trying to lock an account that wasn't SUL yet. I think the concept is sound, these accounts prolly should be locked and hidden. Not sure about mechanics of implementation. ++Lar: t/c 18:39, 10 August 2008 (UTC)

IPs can't have a unified account, so the SUL tool is useless. We have luxo's for that. — Mike.lifeguard | @en.wb 16:49, 10 August 2008 (UTC)

Yeah - this type really needs an SUL link I think. And we do nee to look at the best way we can lock overtly commercial accounts I think. --Herbytalk thyme 16:51, 10 August 2008 (UTC)

Today I also saw some spamming, by 3 accounts on Commons:Talk:Main Page, I have to say that I do agree with Herby about this here, really a nice idea on how to stop spamming at least some of it. --Kanonkas 18:29, 10 August 2008 (UTC)

Good idea, Herby! If you want I can set up a tool similar to SUL:, i.e. list user pages and blocks, for IPs. Of course, other tools are possible as well. --Erwin(85) 19:33, 10 August 2008 (UTC)

Today :) user:Restaurant-lumiere - restaurant spam - [14]. User page advert, series of images all with plenty of information about the restaurant in the "description". --Herbytalk thyme 07:07, 11 August 2008 (UTC)

Well .. enough is enough then. The linkwatchers are from now on also parsing the user namespace. --Dirk BeetstraTC (en: U, T) 10:23, 11 August 2008 (UTC)

Bot is adapted for the new task. Had to tweak en:User:XLinkBot for that, but well, do I also have to add the 'Wikipedia:' namespace? --Dirk BeetstraTC (en: U, T) 10:36, 11 August 2008 (UTC)

Everything that the linkwatchers parse is now getting into the database, and may trigger the XWiki functionality mechanism. We may get more work from this... some more manpower is still necessery (as there are things that I can autocatch which have been excluded this far ..). --Dirk Bee