Filters that work

Summary: The architecture for David Cameron’s filtering plans is wrong and has a negative consequences, however there are alternative architectures which might work.

There has been much news coverage about David Cameron’s plans for opt-out filters for all internet users in the UK. With opt-in systems barely anyone will opt-in and with opt-out systems barely anyone will opt-out and so this is a proposal for almost everyone to have a filter on their internet traffic. Enabling households to easily filter out bad content from their internet traffic is useful in that there are many people who do want to do this (such as myself[1]). However the proposed architecture has a number of significant flaws and (hopefully unintended) harmful side effects.

Here I will briefly recap what those flaws and side-effects are and propose an architecture which I claim lacks these flaws and side-effects while providing the desired benefits.

All traffic goes through central servers which have to process it intensively. This makes bad things like analysing this traffic much easier. It also means that traffic cannot be so efficiently routed. It means that there can be no transparency about what is actually going on as no one outside the ISP can see.

There is no transparency or accountability. The lists of things being blocked are not available and even if they were it is hard to verify that those are the ones actually being used. If an address gets added which should not be (say that of a political party or an organisation which someone does not like) then there is no way of knowing that it has been or of removing it from the list. Making such lists available even for illegal content (such as the IWF’s lists) does not make that content any more available but it does make it easier to detect and block it (for example TOR exit nodes could block it). In particular it means having found some bad content it is easier to work out if that content needs to be added to the list or if it is already on it.

Central records must be kept on who is and who is not using such filters, really such information is none of anyone else’s business. They should not know or be able to tell, and they do not need to.

I am not going to discuss whether porn is bad for you though I have heard convincing arguments that it is. Nor will I expect any system to prevent people who really want to access such content from doing so. I also will not use a magic ‘detect if adult’ device to prevent teenagers from changing the settings to turn filters off.

Most home internet systems consist of a number of devices connected to some sort of ISP provided hub which then connects to the ISP’s systems and then to the internet. This hub is my focus as it is provided by the ISP and so can be provisioned with the software they desire and configured by them but is also under the control of the household and provides an opportunity for some transparency. The same architecture can be used with the device itself performing the filtering, for example when using mobile phones on 3G or inside web browsers when using TLS.

So how would such a system work? Well these hubs are basically just a very small Linux machine, like a Raspberry Pi and it is already handling the networking for the devices in the house, probably running a NAT[0] and doing DHCP, it should probably also be running a DNS server and using DNSSEC. It already has a little web server to display its management pages and so could trivially display web pages saying “this content blocked for you because of $reason, if this is wrong do $thing”. Then when it makes DNS requests for domains to the ISP’s servers then they can reply with additional information about whether this domain is known to have bad content and where to find additional information on that which the hub can then look up and use to as input to apply local policy.
Then the household can configure to hub that applies the policy they want and it can be shipped with a sensible default and no one knows what policy they chose unless they snoop their traffic (which should require a warrant).
Now there might want to be a couple of extra tweaks in here, for example there is some content which people really do not want to see but find very difficult not to seek out, for example I have friends who have struggled for a long time to recover from a pornography addiction. Hence providing the functionality whereby filter settings can be made read only such that a user can choose to make ‘impossible’ to turn off can be useful as in a stronger moment they can make a decision that prevents them being able to do something they do not want to in a weaker moment. Obviously any censorship system can be circumvented by a sufficiently determined person but self blocking things is an effective strategy to help people break addictions, whether to facebook in the run up to exams or to more addictive websites.

So would such a system actually work? I think that it is technically feasible and would achieve the purposes it is intended to and not have the same problems that the current proposed architecture has. However it might not work with currently deployed hardware as that might not have quite enough processing power (though not by much). However an open, well specified system would allow incremental roll out and independent implementation and verification. Additionally it does not provide the services for which David Cameron’s system is actually being built which is to make it easier to snoop on all internet users web traffic. This is just the Digital Economy bill all over again but with ‘think of the children’ rather than ‘think of the terrorists’ as its sales pitch. There is little point blocking access to illegal content as that can always be circumvented, much better to take the content down[2] and lock up the people who produced it, failing that, detect it as the traffic leaves the ISP’s network towards bad places and send round a police van to lock up the people accessing it. Then everything has to go through the proper legal process in plain sight.

[0]: in the case of Virgin Media’s ‘Super Hub’ doing so incredibly badly such that everything needs tunnelling out to a sane network.
[1]: Though currently I do not beyond using Google’s strict safe search because there is no easy mechanism for doing so, the only source of objectionable content that actually ends up on web pages I see is adverts, on which more later.
[2]: If this is difficult then make it easier, it is far too hard to take down criminal website such as phishing scams at the moment and improvements in international cooperation on this would be of great benefit.

3 Responses to “Filters that work”

Do you have a citation for Flaw 1? I had been assuming that the new filters would be architecturally similar to the existing implementation of IWF blacklisting, which (on most ISPs) appears only to send IP addresses known to host blacklisted content through the filtering proxy. Traffic to other addresses doesn’t see any application-layer processing.

Also, the home gateway/hub/router/CPE tends to be *either* under the control of the ISP *or* the end user, which could be a problem. If it’s under the control of the ISP (a la Virgin Media) then you lose the transparency your solution might provide: you have to trust the ISP that they never see the preferences you set on your hub and can’t see its logs (both of which are probably false).

Unfortunately, we’re in the situation of not being able to trust our ISPs or our government. There’s only so much a technical solution can do about this.

I don’t have a citation, however that was what I understood from discussion in the security group meeting and by the phrase ‘all traffic gets routed through the servers even if the filter is not enabled’ which I heard somewhere. I could go digging.

Yes the current situation with hubs does not allow for the solution I propose but if implementing this solution then we could fix all the other problems with the hubs which mean that they are not transparent for the end user.

You are correct, technical solutions are not really going to solve the problem.

The architecture of the filtering system applied to your connection is likely to be up to your ISP. One example is BT’s Cleen Feed system, which Richard Clayton discusses in Chapter 7 of his 2005 PhD thesis (UCAM-CL-TR-653). In a nutshell, it works at two levels: the censor provides a list of URLs to be filtered. That is converted into a list of IP addresses that the ISP’s routers will redirect to a proxy server, which will then filter only those URL requests where the IP address uses does suggest that the request is on the list.