Transparent Proxy as Adblock using Tinyproxy and Dansguardian

As I mentioned in my last post about the migration from Gentoo to Kubuntu I’ll write about how to setup iptables, Tinyproxy and Dansguardian as an Adblocker. That said the setup might be better using Squid instead of Tinyproxy. Why not having a caching transparent proxy around? I don’t do that because I installed all of that on my Notebook and caching there wouldn’t make much sense as the applications cache anyway (per default).

I won’t tell you how to install them, that should be found on the sites above. And its really easy, in Kubuntu you just need to select the packets, they are all available.

Tinyproxy Configuration

I left most of the configuration unchanged as the default values should do. The listen port is set to 8888, you should check the “Allow” setting for security reasons and you might give your Filter a name through “ViaProxyName”. Maybe you wanna set the “ConnectPort” to 0. Or you may want to filter https traffic as well but I doubt it.

Then go ahead and start it! Probably with something like “/etc/init.d/tinyproxy start”. And you may want to make it start automatically on bootup.

Dansguardian Configuration

The dansguardian configuration is a bit trickier, or lets say more time consuming. You will need to play around and adjust the filter settings after some testing, or maybe you can use already existing rules from Firefox Adblock (I think these are RegExps too?). At first you wanna have a look through the dansguardian.conf (should all be in “/etc/dansguardian/”). Here you can adjust a lot of stuff, you even can add ClamAV, an anti-virus scanner. I didn’t but I guess its not too hard to do.

dansguardian.conf

For a start you should set the log levels quite high so you can see in the log file what happens. The important options are

“filterip” – I left that empty as I use my notebook in a safe environment. You may want to set this though…

“filterport” – set to 8080, but you can choose every port you want, basically. On this port the filter listens for incoming requests from browsers etc.

“proxyip” – I have this set to 127.0.0.1, the localhost as my proxy (tinyproxy) sits on the same machine.

“proxyport” – set to 8888 (as in the tinyproxy configuration), dansguardian requests the files on this port from the proxy.

I deactivated the “weightedphrasemode”, I think this can be really useful (probably mainly for the main purpose of dansguardian that is web content filtering for children) but I didn’t use it yet.

There are a lot of options where you can tune to get better performance or better results, to get started the default configuration should be suitable. Just one more thing that I turned of is “virusscan” as mentioned above.

dansguardianf1.conf

Here are all the filter files defined. These should be alright. There is also the “Temporary Denied Page Bypass” called “bypass”, I activated that one and changed it to 300 (5 minutes). For this to work properly (so that you can click on a link in your browser to unblock the blocked content for these 5 minutes) you need to modify the “/etc/dansguardian/languages/%YOURLANGUAGE%/template.html”. Its just HTML with a few placeholders, very easy to adjust. The important part to show the link to unblock the content is ‘…<a href=”-BYPASS-“>…’.

banned*

I will only mention the files that I changed and that seem to be important to me. The names of those files are pretty self-explaining and have examples so just go ahead and have a look!

The banned* files have the stuff that will block the file from being delivered.

bannedregexpurllist

In this file a have three lines (you can create as many as you want as these are regular expressions). Don’t ask why I have three lines. I created that stuff a few years ago and it still serves me good!

contentregexplist

In fact I don’t even use this but it can be very handy. You can rewrite everything inside the document. A real regular expression replacement. Might be helpful for javascript stuff or just to censor pages. Why not exchange “Google” with “They” everywhere? Should look like this:

“Google”->”They”

Yes, I know, thats not even a real regexp that I wrote. 😉

exception*

exceptionsitelist

Here you put in the sites that are allowed to send you every ad and crap that they want. So why should you want to do that? Well, I have just one line in there:
peterzahlt.de

Thats a site from where you can call to and/or from Germany for free for half an hour. Therefor you have to watch their ads. So yeah, you need to display that stuff.

And thats all about the dansguardian configuration. Start it, check the logfile if it logged any errors and solve these issues (if there are any).

One very important thing is to change the “nobody” in line 1 to the user under which tinyproxy is running. This user needs to be allowed to talk directly to the outside world as else we would end up in an infinite loop!

So rule 1 redirects every output that we produce locally thats going to port 80 somewhere to port 8080 on localhost.

Rule 2 sets the source address of these packets to 127.0.0.1. Thats needed to get this working properly.

Now it should be working, you just have to play around with your filters and check in the logfiles if everything works as supposed to!

Conclusion

I had this setup running on my Linux router for a few years. The setup was a bit different as I used a caching proxy (Squid) and I didn’t filter the traffic from the local box. With a setup like that you can easily filter all computers in your network with no hassle and platform independently. For Windows machines this is even more helpful as you often have software (like the ICQ client) that shows ads. These are often requested through port 80 so its easy to block all of that!

And careful, this setup is not meant to be a Web Content Filter for children! If you want that you need to change your configuration and maybe check the dansguardian website!

Another thing you should consider is that many Websites live from ads and banners and all that stuff. So if you block it they don’t get money anymore for your visits. Depending on the site you visit you may add them to the greylists or exceptionslists…

Could you please tell me how to modify my dansguardian.pl script which shows the deny page to the client. It is probably the same mod as the template.html. I cannot find anything on the net about how to modify this script so that the client can temporairly bypasss the deny page to get to the web site. I have mofified the dansguardianf1.conf file already. I am a network admin for a private school and I need for one teacher to be able to access cetain sites. Thank you.

I’m not sure if the bypass is really what you want, as everyone will be able to click on it, and you probably need to content filter without exception for the pupils?
If you want to try changes, just modify the template.html, or do you have problems with that?

You might want to exclude certain computers that can’t be used by the kids from the filter?

Allow me to introduce a better blacklist, we are Squidblacklist.org, the worlds leading publisher of native acl blacklists tailored specifically for use with Squid proxy, as well as we also publish multiple alternative formats for all major third party plugins as well as many other filtering platforms, such as UFDBGuard and Barracuda Networks devices..