This article shows you step by step how to cache your entire tomcat web application with Squid reverse Proxy without writing any Java code.

What is Squid

Squid is a free proxy server for HTTP, HTTPS and FTP which saves bandwidth and increases response time by caching frequently requested web pages. While squid can be used as a proxy server when users try to download pages from the internet, it can be also used as a reverse-proxy by putting squid between the user and your webapp. All user requests first hit Squid. If the requested page already exists in Squid’s cache it is served directly from the cache without hitting your Webapp. If the page does not exist in Squid’s cache, it is fetched from your web application and stored in the cache for future requests.

Squid reduces hits to your server by caching response pages. You don’t have to worry about building page level caching in every application that your write, Squid takes care of that part.

When should I use Squid

Ideally you should use Squid for pages which have a high ratio of reads to writes. In other words, a page that changes less frequently but is accessed very often. Here are some scenarios:

A dynamical web page which displays news and is updated once an hour, and receives hundreds of hits during the hour

When should I not use Squid

In most cases, if the request URL is the only factor which determines the response then you can safely use Squid. See more specific examples below:

If the entire apps is very dynamic in nature, and the validity of pages changes immediately.

Squid is not suitable for apps which require login. This unfortunately is a large number of applications. Such applications need to resort to back end caching, for example use other caching frameworks like Ehcache to cache re-usable page fragments and/or cache database queries and/or other performance bottlenecks.

Apps which heavily use browser cookies. Squid relies on URLs to cache pages. If the page served is computed from URLs + cookies, then you should not cache those pages in Squid.

How does the overall setup work

Apache Squid Tomcat architecture

Apache receives requests on port 80. Apache calls Squid with the request. Squid checks its cache to see if it has the response cached from before. If yes and if the response is not expired, it returns the cached response.In this case:

Squid will write the following header to the response

X-Cache: HIT from www.vineetmanohar.com

X-Cache: HIT from www.vineetmanohar.com

If the response is not found in Squid’s cache, squid will make a call to Tomcat on port 8082. Tomcat’s proxy connector is listening on this port. It processes the request and sends the response back to Squid. Squid saves the response in its cache, unless caching is disabled for that URL. Squid returns the final response to Apache which sends the response back to the user.

What if I don’t want to use Apache

Using Apache is not required to use Squid. You can run Squid on port 80, and point your users directly to Squid. If that is the case, skip section one and directly jump to section 2 below.

Step 1/3: Apache Httpd Config

If you are using Apache as a front end, you need to instruct Apache to forward requests to Squid at port 3128. See the following code snippet. Change the server name and paths to reflect your real values.

Step 2/3: Squid Config

First make sure that Squid is installed on your server. You can download Squid from here.

The squid config file on Linux/Unix is located at this location

/etc/squid/squid.conf

/etc/squid/squid.conf

The config file is pretty long. Follow these instructions and set the values appropriately.

1. # leave the port to 3128 2. http_port 3128 3. 4. # how much memory cache do you want? depends on how much memory you have on the machine 5. cache_mem 200 MB 6. 7. # what's the biggest page that you want stored in memory. If you home page is 100 KB and 8. # you want it stored in memory, you may set it to a number bigger than that. 9. maximum_object_size_in_memory 100 KB 10. 11. # how much disk cache do you want. It is 6400 MB in the following example, change it as per 12. # your needs. Make sure you have that much disk space free. 13. cache_dir ufs /var/spool/squid 6400 16 256 14. 15. # this is probably the most important config section. Here you can configure the cache life for 16. # each URL pattern. 17. 18. # Time is in minutes 19. # 1 day = 1440, 2 days = 2880, 7 days = 10080, 28 days = 40320 20. 21. # do not cache url1 22. refresh_pattern ^http://127.0.0.1:8082/url1/ 0 20% 0 23. 24. # cache url2 for 1 day 25. refresh_pattern ^http://127.0.0.1:8082/url2/ 1440 20% 1440 override-expire override-lastmod reload-into-ims ignore-reload 26. 27. # cache css for 7 days 28. refresh_pattern ^http://127.0.0.1:8082/css 10080 20% 10080 override-expire override-lastmod reload-into-ims ignore-reload 29. 30. # by default cache the whole website for 1 minute 31. refresh_pattern ^http://127.0.0.1:8082/ 0 20% 0 override-expire override-lastmod reload-into-ims ignore-reload 32. 33. # how long should the errors should be cached for. For example 404s, HTTP 500 errors 34. negative_ttl 0 seconds 35. 36. # On which host does tomcat run. Set 127.0.0.1 for localhost 37. httpd_accel_host 127.0.0.1 38. 39. # this is the proxy port as defined in Tomcat server.xml. By default it is "8082" 40. httpd_accel_port 8082 41. 42. # set this to "on". Read more documentation if you want to change this. 43. httpd_accel_single_host on 44. 45. # To access Squid stats via the manager interface, you need to enter a password here 46. cachemgr_passwd your_clear_text_password all 47. 48. # Say "off" if you want the query string to appear in the squid logs. 49. strip_query_terms off

# leave the port to 3128http_port 3128

# how much memory cache do you want? depends on how much memory you have on the machinecache_mem 200 MB

# what's the biggest page that you want stored in memory. If you home page is 100 KB and# you want it stored in memory, you may set it to a number bigger than that.maximum_object_size_in_memory 100 KB

# how much disk cache do you want. It is 6400 MB in the following example, change it as per# your needs. Make sure you have that much disk space free.cache_dir ufs /var/spool/squid 6400 16 256

# this is probably the most important config section. Here you can configure the cache life for# each URL pattern.

# Time is in minutes# 1 day = 1440, 2 days = 2880, 7 days = 10080, 28 days = 40320

Squid Manager Interface

You can access the Squid config and stats via the Squid Manger HTTP interface. Make sure that the “cachemgr.cgi” file which ships with squid installation is in your cgi-bin directory. More documentation on setting that up here.

Store Directory Stats shows you how much disk space is used by the disk cache.

Cache Client List show you the cache HIT/MISS ratio as %. You should monitor this frequently and tune your cache to get a higher hit %.

Reload Squid Config without restarting

Edit the squid config using “vi” or your favorite editor

vi /etc/squid/squid.conf

vi /etc/squid/squid.conf

Once you are done editing, reload the new config without restarting Squid/usr/sbin/squid -k reconfigure

/usr/sbin/squid -k reconfigure

Clearing Squid Cache

To clear Squid cache:

1) Set the memory cache to 4 MB (or a lower number)

cache_mem 8 MB

cache_mem 8 MB

2) Set the disk cache to 8 MB (or a lower number). The disk cache must be higher that the memory cache.

cache_dir ufs /var/spool/squid 20 16 256

cache_dir ufs /var/spool/squid 20 16 256

3) Reload squid config without restart as described in the previous section

4) You may need to wait a few hours for the cache to get cleared. Once the cache is clear, you may restore the previous cache sizes and reload the new config again. You can monitor the cache size through the Squid Manager HTTP interface.

Bypassing Squid

If for some reason you need to bypass Squid, reconfigure Apache to directly send requests to Tomcat. Edit the Apache config file /etc/httpd/conf/httpd.conf

Conclusion

Squid is a very powerful tool for caching. It is not for all applications. Please examine the need of your application and use squid appropriately. I’ve used squid for several years for caching the output from a Java data mashup application and am very satisfied with the ease of use and benefits. Hope you found this tutorial useful. Feel free to post a comment or share your experience with squid.