Another Java Servlet Filter Most Web Applications Should Have
Pages: 1, 2

The final step to testing the response header filter is to browse to a resource in a web application with and without client-side caching and to look at the difference. You can certainly use any resource you like, but for this article, we will use the simplest thing possible. Save the following HTML page in your web application with whatever name you please (we'll assume test.html for this article). Recall that only logo.png is important, since our response header filter is mapped to it.

If you are using Tomcat, a convenient place to save this file is in the base directory of the ROOT web application.

Next, save an image named logo.png in the same directory as the above HTML file. Any image will do. Finally, open your favorite web browser and browse to test.html; e.g., http://127.0.0.1/test.html. The page should render as expected with some text and a graphic. Check your HTTP request/response log to see what went on behind the scenes to make this happen. In Tomcat's log you'll note two instances of a HTTP GET: one for the content of test.html and one for the content of logo.png. For example, Tomcat's logs include the following:

And, if you look closely at all the headers your container set for the response, you'll find that logo.png has the response header Cache-Control set with a value of max-age=3600.

So what does the above information mean? When your web browser retrieved the content for test.html there was a link to logo.png in the document, for which your browser then automatically made another HTTP request. In total, your web browser needed both test.html and logo.png to display the page, and it sent HTTP requests for both of those resources to your web application. Recall we are pretending that logo.png is an image that appears on all of your pages in the web application, and we are trying to use a HTTP header to have the web browser cache logo.png so that it is only downloaded once. The previous request will have done exactly that, because we set the Cache-Control header. Now you can test if the browser successfully cached the image by browsing to another page in the web application that uses logo.png; we'll just reuse test.html. Browse to test.html again. (Don't use the Refresh button!) Again, check the HTTP request/response log your container generates. This time notice there is only an HTTP request for test.html In Tomcat's log, some new lines are appended that include the following:

No HTTP request is made for logo.png. This is because we told your web browser to cache logo.png locally for an hour. Go ahead, try browsing to the page again within the hour and you'll notice the same results. You can even try browsing to any other page (try making one up) that uses logo.png, and the local cache will still be used until it expires. After an hour passes, you can browse back to test.html and once again you'll see logo.png retrieved from the server once and cached for another hour.

Before moving on, let's tie up one loose end. I explicitly said that you should not use your browser's Refresh button to browse back to test.html. Understanding why I said this is important so that there no confusion about how things are being cached. HTTP has a good system for caching content. Similar to how we told the web browser to cache logo.png, the web browser can explicitly choose not to use its cache and the web browser can even try to have the server refresh its own cache. A web browser's Refresh button is almost always the shortcut that causes this. If you are trying this example and you use the Refresh button to revisit web application resources, you will likely notice that client-side caching just doesn't seem to work.

Ensuring Content is Not Cached by the Client's Browser

The HTTP response filter is not only good for having a client cache content. It is equally helpful for having a web browser not cache content, and the technique is as simple as the previous use of the Cache-Control header. Instead of setting the max-age value to an hour, try setting it to zero. This will mean the content is immediately invalid, which in practice will cause a web browser to invalidate its cache. However, the more technically correct method of forcing a client's browser not to cache information is to set the Cache-Control header's value to no-cache. You may also use the private value to specify that the HTTP content should not be cached in any public cache. You can even use the no-store value to ensure that the content is removed from memory as quickly as possible, so that there is little chance it would ever appear in something such as a tape backup of the server.

The following deployment would use all of these values to ensure a HTTP-1.1-compliant web browser doesn't cache content.

Mapping the declaration above to resources in a web application will ensure they are not cached by web browsers. If you are really concerned about HTTP 1.0 browsers, you can look into also setting the Expires header and Pragma header, which are the old way of accomplishing the same thing. If you would like to test the HTTP response filter's ability to prevent caching, try it out using the same techniques we did earlier in this article. Instead of seeing the requests prevented, you'll see the cache-preventing headers set and a request for every resource every time you browse to the page. We won't walk through such a test in this article, but it should be a straightforward exercise if you wish to do it.

Does Client-Side Cache Manipulation Really Help?

Client-side cache manipulation absolutely helps, and it is something that can benefit most web applications you make. In some situations, you need to ensure that a web browser doesn't cache content; say, an instance where sensitive information is being passed to a web browser. Using HTTP headers is your only good method to accomplish this. However, the example of preventing a cache isn't nearly as interesting as having a client cache information. A key part of building an efficient web application is in getting content to a client as quickly as possible and with as little burden on your server(s) as possible. Client-side caching is ideal for this. We looked at the example of caching common graphics (e.g., your company's logo) that appear either at the top or bottom of every web page. However, don't think this technique only works for graphics. You could also have a client's browser cache style sheets, script files, or any other resource your web application uses. Additionally, you can benefit from briefly caching dynamic content. We used an example where the cache was valid for an hour. Why not set the cache to expire in five minutes and apply the filter to every resource in your web application that slowly changes (e.g., news feeds or directory pages)? Being able to control the HTTP caching mechanism is a very helpful tool to have, and it can be as simple to accomplish as using the filter presented in this article.

Before moving from the topic of HTTP caching, it is only fair to point out that a gray area exists between the two extremes of caching on the server side and caching on the client side. In my previous article, the benefits of caching on the server side were introduced. In this article, the benefits of using a client-side cache were introduced. However, HTTP provides several other caching opportunities, some of which are done automatically via HTTP headers. There are two sets of HTTP headers in particular that are worth mentioning, as they fall in this gray area of caching. The first set has to do with keeping track of when content was generated. By default, most web browsers and web servers take advantage of the if-modified-since HTTP request header to keep track of how current content is and to cache content when possible. The process is simple, and works as follows: when a browser first retrieves content from a server, a timestamp is generated. On subsequent requests, the browser requests content but also uses the if-modified-since header to indicate an older version of the content is cached. A HTTP server, upon receiving such a request, can then check if the content has changed since the browser last saw it. If so, new content is sent (the HTTP 200 response). If not, the server sends back a short response (HTTP 304) indicating that the old content should be reused. When everything is over, if the browser's cache is valid, the content is not resent by the server. However, this scheme does not prevent any HTTP requests from occurring; it merely reduces the amount of information that need be sent per request. This significantly differs from the Cache-Control header that was used earlier in the article. Using the Cache-Control header can prevent an HTTP request from ever being needed: the browser already knows the content is good, there is no need to check a timestamp against the server. This difference is significant because an HTTP server can only handle the processing of a certain number of requests, regardless of how much content is being returned per request. If you are expecting to get optimal performance from your web server, it is important to avoid unneeded HTTP requests.

The second gray area is that of HTTP 1.0 cache control. The Cache-Control header is something new as of HTTP 1.1. In HTTP 1.0, you could get the same effect, but you would have to use the Pragma and Expires headers. The Pragma header with a value of no-cache works same as the Cache-Control header with a value of no-cache. Note that in HTTP 1.1, you should no longer use the Pragma header for this purpose; the Cache-Control header is the official replacement. But it might prove handy to use the Pragma header if you are restricted to HTTP 1.0. The Expires header works in a similar way to the Cache-Control header's max-age directive. In HTTP 1.0, you could set the Expires header with a date (optionally, a date before the current date) to signify if content should be considered a valid cache or if a cache should be explicitly reset. As with the Pragma header, the Expires header's use for cache control is intended to be replaced with the Cache-Control header, but it is handy to know about the Expires header if you are working with HTTP 1.0.

Summary and Conclusion

HTTP headers are helpful. The Servlet API lets you manipulate any HTTP header, but it is a poor place to learn about all of the HTTP headers you can manipulate. Realize that you can change HTTP headers to make your web application work better, and use the latest HTTP specification to determine what HTTP headers are helpful for you to use. In this article, we took a specific look at the HTTP response Cache-Control header. This header is helpful for caching things on the client side (saving your server some work) and/or ensuring content is not cached on the client side (making sure a web browser has the latest version of your content).

Take the HTTP response filter that was provided in this article, and use it to aid in your web application development. You have the entire source code, and you may modify it as you see fit. Or, if you simply want to drop a .jar file into the WEB-INF/lib directory of your web application and start deploying the filter, you may get the appropriate .jar file at http://www.jspbook.com/jspbook.jar. The code is actively maintained, and if you like the example, be sure to take a look at my book Servlets and JavaServer Pages; the J2EE Web Tier. It covers up to the latest JSP and Servlet specifications and provides many more helpful code examples for you to use.