How Tos, Tutorials, Tips and Tricks

Proxy Server

I have not blogged since a long time mainly because I was a bit busy authoring a book Squid Proxy Server 3.1: Beginner’s Guide for Packt Publications. The book is an introductory guide to Squid (especially the new features in Squid-3 series) covering both the basic aspects as well as the in dept details for advanced users. The book focuses on learning by doing and provides example scenarios for the concepts discussed throughout the book. Access control configuration, reverse proxying, interception proxying, authentication and other features have been discussed in details with examples.

A lot of people (especially working people with mobile devices like notebook/netbooks) need to use different proxy servers at home and office. There are several Firefox extensions available to achieve the required functionality but IMHO Multiproxy Switch(Mozilla Addon Page) is the best because

Its simple and easy to use. It does what it should. No fancy/extra terrestrial stuff. Just switch proxies 🙂

Easy and Firefox like interface to specify different proxies. Many extensions add their own fancy interfaces for specifying proxies which eventually suck big time.

I am a fan of this one. The No-Proxy list. I could never understand those regular expression based no-proxy lists in FoxyProxy. Multiproxy Switch has Firefox like No-Proxy list which rocks and understandable 🙂

If you happen to come across a better proxy switcher for Firefox, do let us know 🙂

There is a vulnerability in the way Google authentication service works. Whenever you login to any of the Google’s online services like GMail, Orkut, Groups, Docs, Youtube, Calendar etc., you are redirected to an authentication server which authenticates against the entered username and password and redirect back to the required service (GMail, Youtube etc.) setting the session variables.

Now, if you are able to grab the url used to set the session variables, you can login as the user to whom that url belongs from any machine on the Internet (need not be the machine belonging to the same subnet) without entering the username and password of the user.

The proxy servers in the organizations can be used to exploit this vulnerability. Squid is the most popular proxy server used. In the default configuration, squid strips the query terms of a url before logging. So, this vulnerability can’t be exploited. But if you turn off the stripping mechanism by adding the line shown below, then squid will log the complete url.

strip_query_terms off

strip_query_terms off

So, after turning stripping mechanism off, the log will contain urls which will look like this

Replace .co.in with your tld specific to your country. If you paste this url in any browser, it’ll directly log you in and you can do whatever you want to that account. Remember that all such urls remains valid only for two minutes. So, if you use that url after two minutes, it’ll lead nowhere.

At the time of writing this post Orkut, Google Docs, Google Calendar, Google Books and Youtube are vulnerable.

After spending a lot of time with youtube cache, now I am trying to devote some time to update intelligentmirror with required features and enhancements that youtube cache already enjoys. In the same direction here is version 0.5 of intelligentmirror.

Improvements

Added max_parallel_downloads options to controll the maximum threading fetching from upstream to cache the packages.

Fine grained control on logging via max_logfile_size and max_logfile_backups option.

Added setup script to help you install intelligentmirror. No need to execute commands one by one for installation. Just run

[root@localhost]# python setup.py install [ENTER]

[root@localhost]# python setup.py install [ENTER]

Added update script (update-im). So in case you decide to change the locations for caching rpm/deb packages, just run

[root@localhost]# update-im [ENTER]

[root@localhost]# update-im [ENTER]

OR

[root@localhost]# /usr/sbin/update-im [ENTER]

[root@localhost]# /usr/sbin/update-im [ENTER]

Download scheduler similar to youtube cache is added to facilitate the download queing in case of large number of requests.

More informative logging.

cache.log is not flooding anymore with XMLRPC logs and python tracebacks.

Warning : This version of IntelligentMirror is compatible with only squid-2.7 as of now. It is NOT compatible even with squid-3.0.

IntelligentMirror Version 1.0.1

I have been following squid development regularly (at least the part in which I am interested) and they have introduced a new directive in squid-2.7 known as StoreUrlRewrite (storeurl_rewrite_program). Using this directive you can instruct squid to cache url A (http://abc.com/foo/bar/version/crap.rpm) as url B (http://proxy.fedora.co.in/intelligentmirror/crap.rpm). In simple words you can direct squid to cache any url as any other url without any extra efforts.

So keeping the above directive in mind, I have worked out a different version of intelligentmirror especially for squid-2.7.

IntelligentMirror : Old method of operation

IntelligentMirror gets a client request for a URL.

Check: if URL is not in (RPM, metadata file)

Then its none of our business.

Let proxy handle it the normal way.

Done and exit.

Check: if RPM/metadata is available in cache

Stream the RPM/metadata from cache.

Done and exit.

Check: if RPM/metadata is not available in cache

Download in parallel for caching in some dir and stream.

Done and exit.

IntelligentMirror : New method of operation

IntelligentMirror gets a client request for a URL.

Check: if request for rpm

Direct squid to cache the request as http://<same_host_all_the_time>/intelligentmirror/<rpmname>.rpm

Check: if request for deb

Direct squid to cache the request as http://<same_host_all_the_time>/intelligentmirror/<debname>.deb

Done and exit.

So your squid will see every request for an rpm package as a request http://<same_host_all_the_time>/intelligentmirror/<rpmname>.rpm. So, if you happen to request the same rpm from a different mirror, it’ll still be served from cache 🙂

Improvements

No need to check if the url supplied by squid is for rpm or not because storeurl_rewrite_program has an acl controller attached which will invoke intelligentmirror for urls ending in .rpm .

No need to check if the url is already cached or not. No need to worry about the directory where you are going to store the packages. No human intervention is needed in maintaining the cache. Almighty squid is doing everything for us.

No need to worry if the target package has changed because of the resigning or whatever because squid will do that for you.

No need to actually download the package in parallel for caching because squid is already doing that.

No need to worry about the hashing algorithms and storage optimizations for the cached content.

IntelligentMirror version 0.4 is available now. There have been significant improvements in intelligent mirror since last release.

Improvements

Fixed defunct process problem. You will not see defunct python processes hanging around anymore. Previously every forked daemon used to got defucnt because parent never waited for the forked child to finish.

IntelligentMirror now supports caching of Debian packages just like rpms. So now IntelligentMirror is best suited shared environments where people have different tastes.

Intelligent Mirror now uses url_rewrite_program instead of redirect_program. This boosts the efficiency of IntelligentMirror by a significant factor as url_rewrite_program has an acl controller url_rewrite_access. And using url_rewrite_access only requests for rpm/deb packages will be passed to Intelligent Mirror. So, IM now need not process each and every incoming request. Also, it has redirector_bypass directive which will bypass IM in case all the instances of IM are busy serving requests. So, squid will not die with a fatal error in case of huge requests.

Options to enable/disable caching for rpm and Debian packages have been added.

Options to control the total size of caching directories and the size of individual package to be cached have also been introduced.

Proxy authentication is also supported now just the way it is supported in yum.

Packages are not checked for last-modified time anymore. Because in principle two rpms A and B can only have same name iff they have the same contents. So, the delay in response time in case of hits has reduced.

Note : A newer version of intelligentmirror is available now. Please check this.

Intelligent Mirror is basically a tool or squid plugin (redirector) to cache rpm packages so that the subsequent requests for the same package can be served from the local cache which will eventually save a lot of bandwidth and downloading time.

Who needs Intelligent Mirror?

If you are on a shared network where a lot of people use linux distros with RPM as their package manager, then you need this. Universities should come under this category.

If you have a set of systems having red hat derivatives and almost identical OS versions, you need this. LAN setups at home should come under this category.

If you can’t afford to or don’t want to mirror entire fedora repo for local access due to bandwidth limitations, you need this.

What it does?

As described above, Intelligent Mirror, just caches rpms which are requested by the clients in a shared network. And subsequent requests for those rpms are served from the cache. For a detailed description, check the project page.

Why not use Squid in caching mode?

Squid caching is based on url hashing. Let me explain with an example how Intelligent Mirror is actually intelligent as compared to squid while caching rpms.

Let us say there is an rpm yum-3.2.0-1.fc7.i386.rpm . You executed “yum update yum“. And let us say the newer version of yum is yum-3.2.18-1.fc9.i386.rpm which was fetched from one of the fedora mirrors http://abc.com/ (say). Now someone on the same network launched “yum update yum” and he got the same rpm yum-3.2.18-1.fc9.i386.rpm. But this time rpm was fetched from another mirror http://xyz.com/ (say).

Case I : Squid caching

Squid will cache http://abc.com/linux/fc9/updates/i386/yum-3.2.18-1.fc9.i386.rpm . And when http://xyz.com/linux/fc9/updates/i386/yum-3.2.18-1.fc9.i386.rpm will be requested, it’ll result in a cache miss and squid will again download the same package and will cache this one as well. Now there are two problems

Squid is not able to serve from the cache, though the package was the same.

Additional storage space is being wasted in caching the same package. And this can really harm if unluckily a different mirror is picked in all the subsequent queries.

Case II : IntelligentMirror caching

Intelligent Mirror will cache the package yum-3.2.18-1.fc9.i386.rpm without bothering about its origin. And even if yum picks up a different mirror for the subsequent request, the package will be served from the cache and will not be fetched from upstream. So, the obvious advantage of saving the bandwidth and downloading time.

Download

Intelligent Mirror source tarball, rpm, source rpm are available for download from here.

Brief Introduction

IntelligentMirror can be used to create a mirror of static HTTP content on your local network. When you download something (say a software package) from Internet, it is stored/cached on a local machine on your network and subsequent downloads of that particular software package are supplied from the storage/cache of the local machine. This facilitate the efficient usage of bandwidth and also reduces the average download time. IntelligentMirror can also do pre-fetching of RPM packages from fedora repositories spread all over the world and can also pre-populate the local repo with popular packages like mplayer, vlc, gstreamer which are normally accessed immediately after a fresh install.

Definition for a lay man

Think of Internet as a hard disk, your proxy server as a cache and your Intranet as a CPU. Now, whenever your CPU needs to process something, it needs data from cache. If data is not there in cache, it’ll be fetched from RAM and/or hard disk. IntelligentMirror sits on your proxy server and keep caching packages in a browsable manner which can be served via http for subsequent requests.

Mission

Use Cases

When number of machine is more than the number of IP addresses you can afford to buy.

When you want to help this holy world in saving some IPV4 addresses 😛

Assumptions

You have a machine connected directly to internet that you are going to use as a proxy server for other machines on your network.

The machines on your network are using 192.168.0.0/16 as private address space. You can use anyone/multiple address spaces of the available but for this howto we assume 192.168.0.0/16 as the local network.

The local IP address of the machine which will run squid proxy server is 192.168.36.204. You can have any IP, but for this howto we assume this.

How to proceed

First of all ensure that you have squid installed. After installing squid, you need to set access control in squid configuration file which resides in /etc/squid by default. Open /etc/squid/squid.conf and add/edit following lines according to your preferences. Few lines already exist in the configuration file, you can add the rest.

Also, if you want squid to be started every time you boot the machine, execute the following command

chkconfig --level345 squid on

chkconfig --level 345 squid on

You have a squid proxy server running now. You can ask clients to configure there browsers to use 192.168.36.204 as a proxy server with 8080 as proxy port. Command line utilities like elinks, lynx, yum, wget etc. can be asked to use proxy by exporting http_proxy variable as below. Users can also add these lines to ~/.bashrc file to avoid exporting every-time.

Mission

To write a custom Python program which can act as a plugin for Squid to redirect a given URL to another URL. This is useful when already existing redirector plugins for Squid doesn’t suit your needs or you want everything of your own.

Use Cases

When you want to redirect URLs using a database like mysql or postgresql.

When you want to redirect based on mappings stored in simple text files.

When you want to build a redirector which can learn by itself using AI techniques 😛

How to proceed

The redirector program must read URLs (one per line) on standard input, and write rewritten URLs or blank lines on standard output. Note that the redirector program can not use buffered I/O. Squid writes additional information after the URL which a redirector can use to make a decision.

The format of the line read from the standard input by the program is as follows.

The implementation sounds very simple and it is indeed very simple to implement. The only thing that should be taken care of is the unbuffered I/O. You should immediately flush the output to standard output once decision is taken.

For this howto, we assume we have a method called ‘modify_url()‘ which returns either a blank line or a modified URL to which the client should be redirected.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

#!/usr/bin/env pythonimportsysdef modify_url(line):
list= line.split(' ')# first element of the list is the URL
old_url =list[0]
new_url ='\n'# take the decision and modify the url if needed# do remember that the new_url should contain a '\n' at the end.if old_url.endswith('.avi'):
new_url ='http://fedora.co.in/errors/accessDenied.html' + new_url
return new_url
whileTrue:
# the format of the line read from stdin is# URL ip-address/fqdn ident method# for example# http://saini.co.in 172.17.8.175/saini.co.in - GET -
line =sys.stdin.readline().strip()# new_url is a simple URL only# for example# http://fedora.co.in
new_url = modify_url(line)sys.stdout.write(new_url)sys.stdout.flush()

#!/usr/bin/env python
import sys
def modify_url(line):
list = line.split(' ')
# first element of the list is the URL
old_url = list[0]
new_url = '\n'
# take the decision and modify the url if needed
# do remember that the new_url should contain a '\n' at the end.
if old_url.endswith('.avi'):
new_url = 'http://fedora.co.in/errors/accessDenied.html' + new_url
return new_url
while True:
# the format of the line read from stdin is
# URL ip-address/fqdn ident method
# for example
# http://saini.co.in 172.17.8.175/saini.co.in - GET -
line = sys.stdin.readline().strip()
# new_url is a simple URL only
# for example
# http://fedora.co.in
new_url = modify_url(line)
sys.stdout.write(new_url)
sys.stdout.flush()

Save the above file somewhere. We save this example file in /etc/squid/custom_redirect.py. Now, we have the function for redirecting clients. We need to configure squid to use custom_redirect.py . Below is the squid configuration for telling squid to use the above program as redirector.

1
2
3
4
5
6

# Add these lines to /etc/squid/squid.conf file.# /usr/bin/python should be replaced by the path to python executable if you installed it somewhere else.
redirect_program /usr/bin/python /etc/squid/custom_redirect.py
# Number of instances of the above program that should run concurrently.# 5 is good enough but you should go for 10 at least. Anything below 5 would not work properly.
redirect_children 5

# Add these lines to /etc/squid/squid.conf file.
# /usr/bin/python should be replaced by the path to python executable if you installed it somewhere else.
redirect_program /usr/bin/python /etc/squid/custom_redirect.py
# Number of instances of the above program that should run concurrently.
# 5 is good enough but you should go for 10 at least. Anything below 5 would not work properly.
redirect_children 5

Now, start/reload/restart squid. That’s all we need to write and use a custom redirector plugin for squid.