Like all current dynamic web sites, wikis are a common target for spammers wishing to promote products or web sites. MediaWiki offers a number of features designed to combat vandalism in general (see); on this page we deal specifically with wiki spam.

Normally a combination of various methods will be used, in an attempt to keep the number of spam, robot and open-proxy edits to a minimum while limiting the amount of disruption caused to legitimate users of the site.

Note that many of these features are not activated by default. If you are running a MediaWiki installation on your server/host, then you are the only one who can make the necessary configuration changes! By all means ask your users to help watch out for wiki spam (and do so yourself) but these days spam can easily overwhelm small wiki communities. It helps to raise the bar a little. You should also note however, that none of these solutions can be considered completely spam-proof. Always visit 'Recent changes' (Special:RecentChanges) periodically!

One of the more common methods of weeding out automated submissions is to use a CAPTCHA, a system that tries to distinguish humans from automated systems by asking the user to solve a task that is difficult for machines. The ConfirmEdit extension for MediaWiki provides an extensible CAPTCHA framework which can be triggered on a number of events, including

all edits,

edits adding new, unrecognized external links,

user registration.

The extension ships with a default test, but this is a reference implementation, and is not intended for production use. Wiki operators installing ConfirmEdit on a public wiki are advised to use one of the CAPTCHA modules contained within the extension (there are five in total).

The most robust CAPTCHAs available today are your custom QuestyCaptcha questions, if you tailor them tightly to your wiki's audience and update them frequently. ReCaptcha is nowadays beaten by most spammers[1]; the Asirra CAPTCHA, which asks the user to distinguish cats and dogs, is particularly obnoxious to users but may be effective.

It is important to note that CAPTCHAs can block more than undesirable bots: if a script is unable to pass a CAPTCHA, then so is a screen reader, or other software or aid used by the blind or visually impaired. One of the options in CAPTCHA, the "reCAPTCHA" widget, includes an alternative audio CAPTCHA for such cases - but some computer users fail hearing tests and reading tests, so this is not a complete solution. You should consider the implications of such a barrier, and possibly provide an alternative means for affected users to create accounts and contribute, which is a legal requirement in some jurisdictions.

CAPTCHAs have some disadvantages in terms of accessibility and inconvenience to your real human users: it may block users who are blind or visually impaired (reCAPTCHA includes an audio CAPTCHA for such cases). For this reason it is recommended not to use them on every edit, but only on account creation and anonymous edits that insert links (these are the default settings for ConfirmEdit, used by Wikimedia Foundation projects). Consider providing an alternative means for affected users to create accounts and contribute, which is a legal requirement in some jurisdictions.

Also it will not completely spam-proof your wiki; according to Wikipedia "Spammers pay about $0.80 to $1.20 for each 1,000 solved CAPTCHAs to companies employing human solvers in Bangladesh, China, India, and many other developing nations." For this reason it should be combined with other mechanisms.

Under the default configuration, MediaWiki adds rel="nofollow" to external links in wiki pages, to indicate that these are user-supplied, might contain spam, and should therefore not be used to influence page ranking algorithms. Popular search engines such as Google honour this attribute.

Use of the rel="nofollow" attribute alone will not stop spammers attempting to add marketing to a page, but it will at least prevent them from benefiting through increased page rank; we know for sure that some check this. Nonetheless, it should never be relied upon as the primary method of controlling spam as its effectiveness is inherently limited. It does not keep spam off your site.

See NoIndexHistory. Note that putting it on all external links is a rather heavy handed anti-spam tactic, which you may decide not to use (switch off the rel=nofollow option). See Nofollow for a debate about this. It's good to have this as the installation default though. It means lazy administrators who are not thinking about spam problems, will tend to have this option enabled. For more information, see Manual:Costs and benefits of using nofollow.

Every spammer is different, even though they all look boringly similar. If the general countermeasures are not enough, before taking extreme steps make use of the tools which allow you to deal with the specific problems you have.

Often, the same page will be hit repeatedly by spambots. Common patterns observed in spambot-created pagenames include talk page, often outside main space (e.g. Category_talk: are little-used, so make common targets), and other discussion pages

As most abusive edits on wikis which don't require registration to edit are from anonymous sources, blocking edits to these specific pages by anyone other than established users can prevent re-creation of deleted spamdump pages. Typically, any page which is already a regular visitor to special:log/delete on an individual wiki is a good candidate for page protection.

Extension:AbuseFilter allows privileged users to create rules to target the specific type of spam your wiki is receiving, and automatically prevent the action and/or block the user. It can examine many properties of the edit, such as the username, user's age, text added, links added, and so on. It is most effective in cases where you have one or more skilled administrators who are willing to assist in helping you fight spam. The abuse filter can be effective even against human-assisted spammers, but requires continual maintenance to respond to new types of attacks.

The above approach will become too cumbersome if you attempt to block more than a handful of spammy URLs. A better approach is to have a long blacklist identifying many known spamming URLs.

A popular extension for MediaWiki is the SpamBlacklist extension which blocks edits that add blacklisted URLs to pages: it allows such a list to be constructed on-wiki with the assistance of privileged users, and allows the use of lists retrieved from external sources (by default, it uses the extensive m:Spam blacklist).

The TitleBlacklist extension may also be useful, as a means to prevent re-creation of specific groups of pages which are being used by the 'bots to dump linkspam.

Open proxies are a danger mostly because they're used as a way to circumvent countermeasures targeted to specific abuser; see also m:No open proxies.

Some bots exist, e.g. on Wikimedia wikis, to detect and block open proxies IPs, but their code is often not public. Most such blocks are performed manually, when noticing the abuse. It's hence important to be able to tell whether an abusing IP is an open proxy or something else, to decide how to deal with it; even more so if it's an IP used by a registered user, retrieved with the CheckUser extension.

The following measures are for the more technical savvy sysadmins who know what they're doing: they're harder to set up properly and monitor; if implemented bad, they may be too old to be still effective, or even counterproductive for your wiki.

MediaWiki provides a means to filter the text of edits in order to block undesirable additions, through the $wgSpamRegex configuration variable. You can use this to block additional snippets of text or markup associated with common spam attacks.

Typically it's used to exclude URLs (or parts of URLS) which you do not want to allow users to link to. Users are presented with an explanatory message, indicating which part of their edit text is not allowed. Extension:SpamRegex allows editing of this variable on-wiki.

This prevents any mention of 'online-casino' or 'buy-viagra' or 'adipex' or 'phentermine'. The '/i' at the end makes the search case insensitive. It will also block edits which attempt to add hidden or overflowing elements, which is a common "trick" used in a lot of mass-edit attacks to attempt to hide the spam from viewers.

By setting $wgBlockOpenProxies to true in your LocalSettings.php, MediaWiki will automatically scan each editing IP for open HTTP proxies. Such scans may be interpreted as hostile by some system administrators, and so this measure is not recommended.

In addition to changing your MediaWiki configuration, if you are running MediaWiki on Apache, you can make changes to your Apache web server configuration to help stop spam. These settings are generally either placed in your virtual host configuration file, or in a file called .htaccess in the same location as LocalSettings.php (note that if you have a shared web host, they must enable AllowOverride to allow you to use an .htaccess file).

The user agent is the last quoted string on the line, in this case an empty string. Some spammers will use user agent strings used by real browsers, while others will use malformed or blank user agent strings. If they are in the latter category, you can block them by adding this to your .htaccess file (adapted from this page):

This will return a 403 Forbidden error to any IP connecting with a user agent matching the specified regular expression. Take care to escape all necessary regexp characters in the user agent string such as . ( ) - with backslashes (\). To match blank user agents, just use "^$".

Even if the spammer's user agent string is used by real browsers, if it is old or rarely encountered, you can use rewrite rules to redirect users to an error page, advising them to upgrade their browser:

A persistent spammer or one with a broken script may continue to try to spam your wiki after they have been blocked, needlessly consuming resources. By adding a deny from pragma such as the following to your .htaccess file, you can prevent them from loading pages at all, returning a 403 Forbidden error instead:

Much of the most problematic spam received on MediaWiki sites comes from addresses long known by other webmasters as bot or open proxy sites, though there's only anecdotal evidence for this. These bots typically generate large numbers of automated registrations to forum sites, comment spam to blogs and page vandalism to wikis: most often linkspam, although existing content is sometimes blanked, prepended with random gibberish characters or edited in such a way as to break existing Unicode text.

A relatively simple CAPTCHA may significantly reduce the problem, as may blocking the creation of certain often-spammed pages. These measures do not eliminate the problem, however, and at some point tightening security for all users will inconvenience legitimate contributors.

It may be preferable, instead of relying solely on CAPTCHA or other precautions which affect all users, to target specifically those IPs already known by other site masters to be havens of net.abuse. Many lists are already available, for instance stopforumspam.com has a list of "All IPs in CSV" which (as of feb. 2012) contains about 200,000 IPs of known spambots.

Note that, when many checks are performed on attempted edits or pageviews, bots may easily overload your wiki disrupting it more than they would if it was unprotected. Keep an eye on the resource cost of your protections.

You can set MediaWiki to check each editing IP address against one or more DNSBLs (DNS-based blacklists), which requires no maintenance but slightly increases edit latency. For example, you can add this line to your LocalSettings.php to block many open proxies and known forum spammers:

Bad Behavior is a first defense line blocking all requests by known spammers identified via HTTP headers, IP address, and other metadata; it is available as a MediaWiki extension, see Extension:Bad Behavior.

For maximum effectiveness, it should be combined with an http:BL API Key, which you can get by signing up for Project Honey Pot, a distributed spam tracking project. To join Project HoneyPot you will need to add a publicly accessible file to your webserver, then use the following extension code in your LocalSettings.php (or an included PHP file) to embed a link to it in every page:

Set $wgHoneyPotPath to the path of the honeypot page in your LocalSettings.php (e.g. "/ciralix.php"). You may change the form of the link above to any of the alternatives suggested by Project HoneyPot. You may need to log in to Project HoneyPot to see those alternative ways to make honeypot links invisible to humans.[1][2]

Once you're signed up, choose Services→HTTP Blacklist to get an http:BL API Key, and put your key in Bad Behavior's settings.ini.

You may want to save these commands in a file called e.g. updateBannedIPs.sh, so you can run it periodically.

You can also use a PHP-only solution to download the ip-list from stopforumspam. To do so check the PHP script available here.

You have just banned one hundred forty thousand spammers, all hopefully without any disruptive effect on your legitimate users, and said «adieu» to a lot of the worst of the known spammers on the Internet. Good riddance! That should make things a wee bit quieter, at least for a while…

140,000 dead spammers. Not bad, but any proper BOFH at this point would be bored and eagerly looking for the 140,001st spam IP to randomly block. And why not?

Fortunately, dynamically-updated lists of spambots, open proxies and other problem IP's are widely available. Many also allow usernames or e-mail addresses (for logged-in users) to be automatically checked against the same blacklists.

One form of blacklist which may be familiar to MediaWiki administrators is the DNS BL. Hosted on a domain name server, a DNS blacklist is a database of IP addresses. An address lookup determines if an IP attempting to register or edit is an already-known source of net abuse.

A wiki gets an edit or new-user registration request from some random IP address (for example, in the format '123.45.67.89')

The four IP address bytes are placed into reverse order, then followed by the name of the desired DNS blacklist server

The resulting address is requested from the domain name server (in this example, '89.67.45.123.zen.spamhaus.org.' and '89.67.45.123.dnsbl.tornevall.org.')

The server returns not found (NXDOMAIN) if the address is not on the blacklist. If is on either blacklist, the edit is blocked.

The lookup in an externally-hosted blacklist typically adds no more than a few seconds to the time taken to save an edit. Unlike $wgProxyKey settings, which must be loaded on each page read or write, the use of the DNS blacklist only takes place during registration or page edits. This leaves the speed at which the system can service page read requests (the bulk of your traffic) unaffected.

While the original SORBS was primarily intended for dealing with open web proxies and e-mail spam, there are other lists specific to web spam (forums, blog comments, wiki edits) which therefore may be more suitable:

.opm.tornevall.org. operates in a very similar manner to SORBS DNSBL, but targets open proxies and web-form spamming. Much of its content is consolidated from other existing lists of abusive IP's.

.dnsbl.httpbl.org. specifically targets 'bots which harvest e-mail addresses from web pages for bulk mail lists, leave comment spam or attempt to steal passwords using dictionary attacks. It requires the user register with projecthoneypot.org for a 12-character API key. If this key (for example) were 'myapitestkey', a lookup which would otherwise look like '89.67.45.123.http.dnsbl.sorbs.net.' or '89.67.45.123.opm.tornevall.org.' would need to be 'myapitestkey.89.67.45.123.dnsbl.httpbl.org.'

Web-based blacklists can identify spammer's e-mail addresses and user information beyond a simple IP address, but there is no standard format for the reply from an HTTP blacklist server. For instance, a request for http://botscout.com/test/?ip=123.45.67.89 would return "Y|IP|4" if the address is blacklisted ('N' or blank if OK), while a web request for http://www.stopforumspam.com/api?ip=123.45.67.89 would return "ip yes 2009-04-16 23:11:19 41" if the address is blacklisted (the time, date and count can be ignored) or blank if the address is good.

With no one standard format by which a blacklist server responds to an enquiry, no built-in support for most on-line lists of known spambots exists in the stock MediaWiki package. The inability to specify more than one blacklist server further limits the usefulness of the built-in $wgEnableDnsBlacklist and $wgDnsBlacklistUrls options. Since rev:58061, MediaWiki has been able to check multiple DNSBLs by defining $wgDnsBlacklistUrls as an array.

As most blacklist operators provide very limited software support (often targeted to non-wiki applications, such as phpBB or Wordpress), third-party adaptations of these clients have been built and deployed on some wikis to check spambots. As the same spambots create similar problems on most open-content websites, the worst offenders attacking MediaWiki sites will also be busily targeting thousands of non-wiki sites with spam in blog comments, forum posts and guestbook entries.

Automatic query of multiple blacklist sites is therefore already in widespread use protecting various other forms of open-content sites and the spambot names, ranks and IP addresses are by now already all too well known. A relatively small number of spambots appear to be behind a large percentage of the overall problem. Even where admins take no prisoners, a pattern where the same spambot IP which posted linkspam to the wiki a second ago is spamming blog comments somewhere else now and will be spamming forum posts a few seconds from now on a site half a world away has been duly noted. One shared external blacklist entry can silence one problematic 'bot from posting on thousands of sites.

This greatly reduces the number of individual IP's which need to be manually blocked, one wiki and one forum at a time, by local administrators.

Some anti-spam sites, such as projecthoneypot.org, provide code which you are invited to include in your own website pages. Typically, the pages contain one or more unique, randomised and hidden e-mail addresses or links, intended not for your human visitors but for spambots. Each time the page is served, the embedded addresses are automatically changed, allowing individual pieces of spam to be directly and conclusively matched to the IP address of bots which harvested the addresses from your sites. The IP address which the bot used to view your site is automatically submitted to the operators of the blacklist service. Often a link to a fake 'comment' or 'guest book' is also hidden as a trap to bots which post spam to web forms. See Honeypot (computing).

Once the address of the spammer is known, it is added to the blacklists (see above) so that you and others will in future have one less unwanted robotic visitor to your sites.

While honeypot scripts and blacklist servers can automate much of the task of identifying and dealing with spambot IPs, most blacklist sites do provide links to web pages on which one can manually search for information about an IP address or report an abusive IP as a spambot. It may be advisable to include some of these links on the special:blockip pages of your wiki for the convenience of your site's administrators.

Typically, feeding the address of any bot or open proxy into a search engine will return many lists on which these abusive IP's have already been reported. In some cases, the lists will be part of anti-spam sites, in others a site advocating the use of open proxies will list not only the proxy which has been being abused to spam your wiki installation but hundreds of other proxies like it which are also open for abuse.

While any plain-text lists of open proxies must still be imported into your wiki manually, a Spambot Search Tool may be configured as an automated script to query any of the following databases:

fSpamlist - fspamlist.com

StopForumSpam - stopforumspam.com

Sorbs - sorbs.net

Spamhaus - spamhaus.org

SpamCop - spamcop.net

ProjectHoneyPot - projecthoneypot.org

Bot Scout - botscout.com

DroneBL - dronebl.org

AHBL - ahbl.org

s5h spam - all.s5h.net

It is also possible to block wiki registrations from anonymised sources such as Tor proxies (Tor Project - torproject.org), from bugmenot users or from e-mail addresses (listed by undisposable.net) intended solely for one-time use.

See also Blacklists Compared - 1 March 2008 and spamfaq.net for lists of blacklists. Do keep in mind that lists intended for spam e-mail abatement will generate many false positives if installed to block comment spam on wikis or other web forms. Automated use of a list that blacklists all known dynamic user IP address blocks, for instance, could render your wiki all but unusable.

To link to IP blacklist sites from the Special:Blockip page of your wiki (as a convenience to admins wishing to manually check if a problem address is an already-known 'bot):

This will add an invitation to "check this IP at: Domain Tools, OpenRBL, Project Honeypot, Spam Cop, Spamhaus, Stop Forum Spam" to the page from which admins ask to block an IP. An IP address is sufficient information to make comments on Project Honeypot against spambots, Stop Forum Spam is less suited to reporting anon-IP problems as it requires username, IP and e-mail under which a problem 'bot is attempting to register on your sites. The policies and capabilities of other blacklist-related websites may vary.

Note that blocking the address of the spambot posting to your site is not the same as blocking the URL's of specific external links being spammed in the edited text. Do both. Both approaches used in combination, as a means to supplement (but not replace) other anti-spam tools such as title or username blacklists and tests which attempt to determine whether an edit is made by a human or a robot (captcha, bad behaviour or akismet) can be a very effective means to separate spambots from real, live human visitors.

This page lists features which are currently included, or available as patches, but on the discussion page you will find many other ideas for anti-spam features which could be added to MediaWiki, or which are under development.

SimpleAntiSpam — adds an invisible input field into the edit view and checks if the box was filled; if it was, the extension disallows the edit. Won't affect human users in any way. This functionality is now part of MediaWiki core since 1.22.