When someone tries to save a page, SpamBlacklist checks the text against a (potentially very large) list of illegal host names.
If there is a match, the extension displays an error message to the user and refuses to save the page.

Done – Navigate to Special:Version on your wiki to verify that the extension is successfully installed.

To users running MediaWiki 1.24 or earlier:

The instructions above describe the new way of installing this extension using wfLoadExtension() If you need to install this extension on these earlier versions (MediaWiki 1.24 and earlier), instead of wfLoadExtension('SpamBlacklist');, you need to use:

The default additional source for SpamBlacklists list of forbidden URLs is the Wikimedia spam blacklist on Meta-Wiki, at m:Spam blacklist. By default, the extension uses this list, and reloads it once every 10-15 minutes. For many wikis, using this list will be enough to block most spamming attempts. However, since the Wikimedia blacklist is used by a diverse group of large wikis with hundreds of thousands of external links, it is comparatively conservative in the links it blocks.

The Wikimedia spam blacklist can only be edited by administrators; but you can suggest modifications to the blacklist at m:Talk:Spam blacklist.

You can add other bad URLs on your own wiki. List them in the global variable $wgSpamBlacklistFiles in LocalSettings.php, AFTER the require_once "$IP/extensions/SpamBlacklist/SpamBlacklist.php"; see examples below.

$wgSpamBlacklistFiles is an array, with each value containing either a URL, a filename or a database location.

If you use $wgSpamBlacklistFiles in LocalSettings.php, the default value of "[[m:Spam blacklist]]" will no longer be used - if you want that blacklist to be accessed, you will have to add it in manually, see examples below.

Specifying a database location allows you to draw the blacklist from a page on your wiki.

The format of the database location specifier is "DB: [db name] [title]". [db name] should exactly match the value of $wgDBname in LocalSettings.php. You should create the required page name [title] in the default namespace of your wiki. If you do this, it is strongly recommended that you protect the page from general editing. Besides the obvious danger that someone may add a regex that matches everything, please note that an attacker with the ability to input arbitrary regular expressions may be able to generate segfaults in the PCRE library.

If you want to, for instance, use the English-language Wikipedia's spam blacklist in addition to the standard Meta-Wiki one, you could call the following in LocalSettings.php, AFTER the require_once "$IP/extensions/SpamBlacklist/SpamBlacklist.php" or wfLoadExtension( 'SpamBlacklist' ) call:

Here's an example of an entirely local set of blacklists: the administrator is using the update script to generate a local file called "wikimedia_blacklist" that holds a copy of the Meta-Wiki blacklist, and has an additional blacklist on the wiki page "My spam blacklist":

If you encounter issues with the blacklist, you may want to increase the backtrack limit. However on the other hand, this can reduce your security against DOS attacks, as the backtrack limit is a performance limit:

It is questionable how effective the Wikimedia spam blacklists are at keeping spam off of third-party wikis. Some spam might be targeted only at Wikimedia wikis, or only at third-party wikis, which would make Wikimedia's blacklist of little help to said third-party wikis in those cases. Also, some third-party wikis might prefer that users be allowed to cite sources that are not considered reliable on Wikipedia, or that Wikipedia has considered so ideologically offensive as to warrant blacklisting. Sometimes what one wiki considers useless spam, another wiki might consider useful.

The text you wanted to save was blocked by the spam filter. This is probably caused by a link to a blacklisted external site. {{SITENAME}} maintains [[MediaWiki:Spam-blacklist|its own blacklist]]; however, most blacklisting is done by means of [[metawikimedia:Spam-blacklist|Meta-Wiki's blacklist]], so this block should not necessarily be construed as an indication that {{SITENAME}} made a decision to block this particular text (or URL). If you would like this text (or URL) to be added to [[MediaWiki:Spam-whitelist|the local spam whitelist]], so that {{SITENAME}} users will not be blocked from adding it to pages, please make a request at [[MediaWiki talk:Spam-whitelist]]. A [[Project:Sysops|sysop]] will then respond on that page with a decision as to whether it should be whitelisted.

This extension examines only new external links added by wiki editors. To check user agents, add Bad Behaviour or Akismet, and to check an editor's IP address against lists of known spambots, supplement this with Check Spambots. As the various tools for combating spam on MediaWiki use different methods to spot abuse, the safeguards are best used in combination.

The Extension:SpamBlacklist/update script is a cron script that can automate updates from shared blacklists. If you are using memcached, you will also have to delete the spam_blacklist_regexes key (for example, using maintenance/mcc.php).

There're no way to let some users override spam blacklist. See bugzilla:34928.

The extension creates a single regex statement which looks like /https?:\/\/[a-z0-9\-.]*(line 1|line 2|line 3|....)/Si (where all slashes within the lines are escaped automatically).
It saves this in a small "loader" file to avoid loading all the code on every page view.
Page view performance will not be affected even if you're not using a bytecode cache although using a cache is strongly recommended for any MediaWiki installation.

The regex match itself generally adds an insignificant overhead to page saves (on the order of 100ms in our experience).
However, loading the spam file from disk or the database, and constructing the regex, may take a significant amount of time depending on your hardware.
If you find that enabling this extension slows down saves excessively, try installing a supported bytecode cache.
The SpamBlacklist extension will cache the constructed regex if such a system is present.

If you're sharing a server and cache with several wikis, you may improve your cache performance by modifying getSharedBlacklists and clearCache in SpamBlacklist_body.php to use $wgSharedUploadDBname (or a specific DB if you do not have a shared upload DB) rather than $wgDBname. Be sure to get all references! The regexes from the separate MediaWiki:Spam-blacklist and MediaWiki:Spam-whitelist pages on each wiki will still be applied.

In its standard form, this extension requires that the blacklist be constructed manually. While regular expression wildcards are permitted, and a blacklist originated on one wiki may be re-used by many others, there is still some effort required to add new patterns in response to spam or remove patterns which generate false-positives.

Much of this effort may be reduced by supplementing the spam regex with lists of known domains advertised in spam e-mail. The regex will catch common patterns (like "casino-" or "-viagra") while the external blacklist server will automatically update with names of specific sites being promoted through spam.

In the filter() function in SpamBlacklist_body.php, approximately halfway between the file start and end, are the lines:

# Do the matchwfDebugLog('SpamBlacklist',"Checking text against ".count($blacklists)." regexes: ".implode(', ',$blacklists)."\n");

Directly above this section (which does the actual regex test on the extracted links), one could add additional code to check the external RBL servers:

This ensures that, if an edit contains URLs from already-blacklisted spam domains, an error is returned to the user indicating which link cannot be saved due to its appearance on an external spam blacklist. If nothing is found, the remaining regex tests are allowed to run normally, so that any manually-specified 'suspicious pattern' in the URL may be identified and blocked.

Note that the RBL servers list just the base domain names - not the full URL path - so http://example.com/casino-viagra-lottery.html will trigger RBL only if "example.com" itself were blacklisted by name by the external server. The regex, however, would be able to block on any of the text in the URL and path, from "example" to "lottery" and everything in between. Both approaches carry some risk of false-positives - the regex because of the use of wildcard expressions, and the external RBL as these servers are often created for other purposes - such as control of abusive spam e-mail - and may include domains which are not engaged in forum, wiki, blog or guestbook comment spam per se.

This extension is being used on one or more Wikimedia projects. This probably means that the extension is stable and works well enough to be used by such high-traffic websites. Look for this extension's name in Wikimedia's CommonSettings.php and InitialiseSettings.php configuration files to see where it's installed. A full list of the extensions installed on a particular wiki can be seen on the wiki's Special:Version page.