Transparent Web Proxying with Cisco, Squid, and WCCP

Contents

Introduction

There are a number of good reasons for companies to deploy proxies for user access to the Internet. Amongst these are

Monitoring of web sites and traffic volumes

Restricting web access - by user, web sites, time of day, etc.

Using caching to reduce traffic volumes

Managing bandwidth

There are also a number of challenges faced when implementing
proxies. Probably the top one is the job of configuring all of the web
browsers to use the proxy, and then comes the problem of what to do if
the proxy fails.

This article proesents a solution of web proxying which is
transparent to the end-user - it requires no browser configuration. It
is also resilient to failure, in that if the proxy server fails then
web access continues to be provided without disruption.

A Basic Network and Web proxy

In the network drawing below I show a basic network with access to the
Internet, this is a very common configuration for small business
networks.

Figure 1: A basic network and proxy

In larger security-conscious organisations it is necessary to protect
the proxy server against attack and misuse. This is usually performed
by connecting the proxy into a DMZ network as shown in the next
drawing.

Figure 2: A basic network with DMZ-protected proxy

A common solution for transparent proxying is to have all outbound
traffic pass through a server which will detect web access and
redirect the request to an internal proxy. This has a number of
problems - not least of which is that it can't support multiple
proxies, and when the server fails then all web access fails along
with it.

WCCP Overview

Most Cisco routers support a protocol called Web Cache Communication
Protocol, or WCCP. This protocol is used by a proxy server, such as a
LInux server running the Squid proxy, to tell the router that it is
alive and ready to process web access requests. WCCP uses the UPD
protocol on port 2048 - it is essentially a one-way communication from
the proxy to the router.

Figure 3: WCCP between the proxy and router

WCCP has a number of advantages when used between a proxy and the gateway router.

You can have multiple proxy servers. In fact, you can have
almost any number if your router is big enough to handle them. THis
means for large organisation the load will be spread amongst them
improving performance.

Access is resilient to failure. If a proxy fails, then the
router will immediately start using another (if you've got more than
one configured), otherwise it will stop using proxies and forward
requests directly to the Internet. The router can also be configured
to block Internet web access if there are no running proxies
available.

Optimised hashing of URLs. When you have more than one proxy a
user will request a web page that will then be cached by a
proxy. The next time any user requests the same page, the router
will send the request to the same proxy with the cached copy of the
page.

One caveat here to note though : WCCP is patented by Cisco, and is
generally only available on Cisco routers and some high-end Cisco
switches. A few other vendors such as BlueCoat also support WCCP, but
not many.

WCCP proxy traffic flows are a little bit unusual, and can be
very confusing to begin with. The following drawing shows the main
flows for a WCCP proxy:

Figure 4: WCCP traffic flows

There's some interesting things to note about the traffic flows here.

The Squid proxy sends a WCCP packet to the router every 10 seconds
to tell the router that the proxy is alive and ready to receive web
requests. You can now see here that it is easy to have multiple proxy
servers that can work with the router.

When a client makes a request for an Internet web page, it sends
it directly to the Internet via the outer, as shown in (1)
above.

The router captures the request, encapsulates it in a GRE packet,
and forwards it to the proxy as shown in (2) above.

The linux system un-encapsulates the GRE packet and sends the
request to the Squid proxy by performing a Destination NAT operation
on the packet - note that Squid now receives the original packet with its
original source and destination IP addresses.

The Squid proxy now fetches the web page from the Internet server
in the normal fashion shown in (3) above - it uses its own IP
address as the source and the original destination IP address for the
destination. Note that the router does not intercept and attempt to
proxy this request.

Once Squid has downloaded the page, it saves the data in its own
cache, then replies directly back to the client on the internal
network. And this is the tricky thing right here - when Squid replies
it uses the IP address of the Internet server as the source in the
packet, and the client IP address as the destination, this is shown
in (4) above.

So, while the client thinks it is interacting with the remote web
server via the Internet router, in actual fact it is interacting with
the Squid proxy which is caching pages behind the scenes. If another
user on another client makes a request for the same page they go
through the same flow, but because the page is cached there is no need
for Squid to fetch the page from the Internet server again.

In the remainder of this paper I will briefly show the Cisco,
Linux, and Squid configurations required to get this working.

Cisco Configuration

In this example, I will have 2 proxies configured on the internal
network (192.168.1.0/24) with IP addresses of 192.168.1.252 and
192.168.1.253. The first step is to define an access list containing
the addresses of the proxies, and assign this as the list of WCCP
proxies:

Next we define another access-list to define direct or WCCP-proxied
internet access. The proxies on 192.168.1.252 & 253 are denied access
to WCCP, all other hosts on 192.168.1.0/24 are proxied when going to
port 80, all others are denied. Denial implies direct access to the
remote web server.

At this point, client browsers which are not configured to use the
Squid proxy explicitly may not be able to reach Internet web sites if the
Squid proxy is registered with the router. If this is an issue for the
users then the best option to disable & enable WCCP proxying is to
remove the configuration from the interface (Fastethernet/0 in this case):

int f0
!
no ip wccp web-cache redirect in

and to enable it:

int f0
!
ip wccp web-cache redirect in

Squid Configuration

Now we need to configure a Squid proxy on a Linux server. I won't
cover the basic installation - just the configuration part, so I
assume you know a little bit about configuring Squid. To start with,
check that Squid is installed and is working as a proxy by setting it
up in your browser and fetching a few web pages through it.
First of all, check that your Squid has been built ready for WCCP proxying.
Run squid -v and verify that the following options are included:

--enable-linux-netfilter
--enable-wccpv2

If those options aren't there then you'll have to download the squid
source code and build it from scratch with these options included in
the ./configure build command.
Now to configure WCCP for your Squid proxy. In this example I add a
new listening port (port 3127) to Squid for transparent proxying,
leaving the default port of 3128 available for normal proxying. Add
the following lines to /etc/squid/squid.conf:

Restart the Squid proxy once the changes have been made, and verify the following:

Squid is listening on port 3128 & serving normal proxy requests

Squid is listening on 3127

Check no errors in Squid logs

You can now go back to your Cisco router and check that the Squid
proxy has registered with WCCP, with the
show ip wccp command.

Linux Network Configuration

Now that Squid is working, we need to get requests redirected from the
Cisco router to the proxy. This is done by the router encapsulating
the request packet within a GRE packet, hten forwarding it to the IP
address of the Squid proxy. On the router side, this is automatic. But
we need to configure the Linux system to receive these
GRE-encapsulated packets, un-encapsulate them, and forward them to the
listening proxy.
I'm using a RedHat Linux system here, so the configuration files are those used by RedHat.
Create a new interface, gre0 for the GRE interface, create the file /etc/sysconfig/network-scripts/ifcfg-gre0 with the following contents:

Run "ifdown gre0" and "ifup gre0" to test it, then run "ifconfig gre0" and verify the IP addressing.
Enable IP forwarding, disable route packet filters, configure DNAT in IPtables
Run the following commands:

You'll need to run these are system boot time, add the commands to the
start section of the /etc/init.d/squid script.

Testing

tcpdump is your friend when testing this configuration. Check
the flows in order shown in Figure 4 above and verify that each
one works in order. Remember that the Squid proxy will use the IP
address of the Internet web server when replying back to the client,
so be aware of this. If your proxy is behind a firewall you will
probably have to disable anti-spoofing mechanisms to allow the proxy
to spoof the web server's IP address.

Most problems seem to occur in the Linux GRE & NAT
configuration. And don't forget to check the Squid logs for errors.

Closing Notes

In this paper I've described a method of transparently caching web
requests using a Squid proxy and WCCP-enabled Cisco router. As
described in the introduction this solution can be used to implement
security controls and bandwidth management without having to
reconfigure client systems to explicitly use a proxy server.