Wikipedia, Cleanfeed & Filtering

IWF classified a Wikipedia page as containing a pornographic image of a child. As a result UK ISP’s that participate in the cleanfeed program are now blocking access to the Wikipedia page of the band the Scorpions because of a controversial album cover that is potentially child pornography and thus illegal under UK law. The IWF states:

A Wikipedia web page, was reported through the IWF’s online reporting mechanism in December 2008. As with all child sexual abuse reports received by our Hotline analysts, the image was assessed according to the UK Sentencing Guidelines Council (page 109). The content was considered to be a potentially illegal indecent image of a child under the age of 18, but hosted outside the UK. The IWF does not issue takedown notices to ISPs or hosting companies outside the UK, but we did advise one of our partner Hotlines abroad and our law enforcement partner agency of our assessment. The specific URL (individual webpage) was then added to the list provided to ISPs and other companies in the online sector to protect their customers from inadvertent exposure to a potentially illegal indecent image of a child.

But why didn’t they just block access to the specific URL of the offending image? Instead they block the entire page, the text (and other images) of which are completely legal. There is no technical reason why they cannot block URLs to specific offending images in exactly the same way as they can block a specific Wikipedia page and not the entire Wikipedia site.

The IWF collects URLs that are potentially illegal for containing child pornography and sends them to participating ISP’s in the U.K. as part of the cleanfeed program. The ISPs then block access to these URLs. These URLs may be shared with other agencies through the IN HOPE network and possibly with commercial filtering companies as well. Canada has a similar cleanfeed program in which Cyberip collects the potentially illegal URLs and send them to Canadian ISPs who then block access to them. One of the main why cleanfeed has been successful and replicated in oher countries is that it was supposed to elegantly avoid the pitfall of overblocking, the key objection that was consistently raised civil libertarians and others with respect to filtering. This is why filtering at the URL level is so important: one offending page can be blocked while the rest of the site remains available.

One of the questions I’ve often raised (in the Canadian context) concerns what precisely is blocked. We know that cleanfeed systems can block at the URL level, so why block access to the web page containing the offending image and not the the URL to the offending image itself? There is no technical reason for not doing so. If IWF added the URL to the specific offending image embedded in the Scorpions Wikipedia page the text of the article, which is perfectly legal, along with all the other legal images would still be available. Only the one offending image would have been blocked.

For a system that was designed to not overblock I find it hard to understand why they don’t the specific offending images. If an entire website was devoted to showing images of child abuse then it would be understandable, but Wikipedia?

6 comments.

I hope it’s not too off-topic to note here that (a) we’re an educational charity (b) we live off public donations (c) we have a funding drive on right now :-)

We have no money to spend on expensive legal assistance on issues like this. (We do have Mike Godwin, the world’s most famous Internet lawyer, as our general counsel, which is just unbelievably cool and very effective.) So all we have is people talking about these issues and giving us their spare bucks to keep the sites running. Thank you :-)

There is good technical reason for blocking pages and not individual images. With dynamic HTML, either client-side or server-side, the URL for the image could change periodically or even every time the page is accessed. Those responsible for creating the blocklist can’t be expected to know how each website works or which ones are the ‘good guys’ and wouldn’t do something like that. The immutable rule will be to block the URL for the page – that’s most likely to be static and linked to from elsewhere.

“But why didn’t they just block access to the specific URL of the offending image?”

My *impression* from the garbled things they’ve said, is that they’re used to dealing with images where URLs have tracking data in the file path, hence “specific URL” is variable. If you look at sophisticated spam in HTML, you can see what I *conjecture* is their reasoning. So they block the most static-looking way of accessing the image.

The IWF analysts determine whether to block an entire website or an individual page depending on the context. There is a choice on the part of the analyst with how fine grain the blocking should be based on the context.If the URls to the images are dynamic etc… then the analyst may use his/her discretion.

(BTW, SecureComputing is currently blocking only the URL to the image, not the full Wikipedia page).

@PA – Are saying that the IWF analysts cannot tell the difference between Wikipedia and a site devoted to delivering child porn that dynamically changes the paths to the images to avoid filtering?

@4, nart said, “Are saying that the IWF analysts cannot tell the difference between Wikipedia and a site devoted to delivering child porn that dynamically changes the paths to the images to avoid filtering?”

That is a distinct possibility. And, I’m not referring to the importance of applying a process consistently, without fear or favour. At the start of this year, when I first had a go at contributing to Wikipedia, I was perplexed at the use of the hostname upload.wikipedia.org. In all the years I’ve been using computers, the convention has been that the name ‘upload’ is reserved for an area that has special properties with regard to uploading – normally for the temporary storage of files. It’s the directory on an FTP server that the anonymous account can write to. It can be the directory on a photo printing service that doesn’t have a quota, so you can build up a temporary print job of any size. As a subdomain, it’s normally the access point for maintenance of a website. Location names with the word upload in them are not usually intended to be a way of permanently referring to something.

The IWF couldn’t ask the Wikimedia Foundation how their websites work. Do the IWF have the time or expertise to perform their own technical analysis of each website hosting a potentially illegal image? I have no idea why they didn’t at least attempt to block both the page and the image. Some of the URLs on the blocklist with image extensions could be there because they are the exact URLs as submitted by members of the public. It has become clear to many that the IWF’s procedures need to be revised. Maybe the work needs to be brought within a UK police force, so that it will be possible to call on more resources to make better decisions and to give those decisions greater legitimacy.