Monday, May 7, 2012

Making Apache and Tomcat Work Together

Last year I was approached by a Swiss customer looking for help. Starting with an already working Tomcat application, the company wanted to configure the Apache HTTP Server in front of it as reverse proxy, virtual host broker, and URI translator.
Why use Apache as a front-end server? It might seem more convenient
to keep things simple, letting a back-end server like Tomcat serve
clients directly – and in some cases it is, depending on the type of
back-end server and the specific requirements of the project.
If your main server software is secure and fast enough for your
environment and provides all the features you need, you probably don’t
need a proxy layer. It usually doesn’t make sense to run Apache in front
of lighttpd, for example, especially considering that a layer of proxying increases the performance and maintenance load.
Yet in a lot of cases you will be running a specialized back-end
server to provide functionality that Apache isn’t capable of. You might
deploy Tomcat, for example, so you can run Java Servlets. In these cases
it is often beneficial to pay the performance and complexity costs of
adding Apache to the mix.
Apache is a secure, flexible, and fast general-purpose HTTP server.
It comes with a huge variety of modules that provide functionality for
all kinds of special purposes, from LDAP authentication to request
compression. Its configuration is straightforward, using the traditional
near-flat Unix configuration syntax instead of XML. Specialized
back-end servers are usually slower than Apache when it comes to serving
static files such as images or office software files.

Choosing a Communication Protocol

There are several ways to make Apache and Tomcat talk to each other, of which three are most popular:

mod_proxy is the most basic way to make Apache pass
on requests to a back-end server and relay back its responses to the
client. The latter process is called reverse proxying. It works not only with Tomcat but with every back-end server that supports the HTTP protocol.

mod_proxy_ajp – The Apache JServ Protocol
(AJP) is a simple binary packet format that offers greater speed
compared to plain HTTP(S) back-end communication. mod_proxy_ajp adds
support for the AJP protocol to mod_proxy. It is part of Apache’s
default distribution.

mod_jk is another way to make Apache talk to a Servlet back-end server, and the recommended way to make Apache talk to Tomcat, but
it is more complex to configure than the other two due to its flexibility. mod_jk is maintained by the Tomcat community.

Obsolete modules that are no longer maintained and thus not recommended for Apache-Tomcat communication include mod_webapp (also called warp), mod_jserv, and jk2.
One good way to proceed is to set up everything using mod_proxy
first; configuration is simple and all communication is clear-text HTTP.
If you need additional speed later, you can add mod_proxy_ajp to the
mix within minutes. If you want to take advantage of mod_jk’s
performance, stability, and flexibility, you can invest more time to
install and configure it properly.

Basic Setup

Let’s assume that Tomcat is running on local port 8080, serving our
application and accessed directly. Apache is running on port 80 with all
modules in the standard distribution available, including mod_proxy and
mod_rewrite.
As long as you stay with standard HTTP proxying using mod_proxy and
don’t switch to AJP, any back-end server is fine for initial testing of
the
proxying chain. But at some point you need to switch to your actual
Tomcat setup, since it will have its own requirements that you need to
cater to using proxying and rewrite rules.
With that in mind, the basic starting point for our two-way proxying calls for configuration settings that look like this:

These configuration line examples, and the ones that follow, are
snippets that need to go into your Apache’s global server context or
virtual host section. The first line above relays all client requests to
/ and below to a back-end server running on port 8080 using HTTP. The
second line provides the same functionality for the reverse direction.
Try this first to make sure that everything works as intended on a basic
level.

Path Translation and Exceptions

Let’s add a common requirement. We’re still going to serve our application to the client at the top level /, but the corresponding
back-end path will be /application/. Servlet applications running under
Tomcat are often set up in this way to provide multiple applications in
one
server instance. Our setup now looks like this:

This seemingly small change has larger implications than it might
seem at first glance, because your back-end application is not aware of
this translation and still hands out its usual paths. In some cases the
application can be adjusted to match the new paths; often this isn’t
possible, or changing all paths in CSS and JS files would be too
cumbersome and error-prone.
In this case we need to find all the spots that need additional
proxying rules. Broken images or CSS files show up plainly enough, and Firebug
or similar tools can help you spot broken script files. Other issues
may come up when AJAX requests are made to a non-existent location, so
it’s important to do a lot of testing and verification.
For each spot that needs additional proxying rules you can either add
a rule for mod_rewrite to rewrite it or add mod_proxy directives. In
our particular project we decided to use the latter approach to keep
special proxy mappings apart from other rewrite rules.
Let’s presume that our back-end application refers to some static
image files served by a second application running in Tomcat at the
location /application2/images/, and another set of static JavaScript
files at /application2/js/. We don’t want to expose the whole tree under
/application2/ for security reasons, so we use tighter mappings as
follows:

The first matching proxying rule terminates the mod_proxy decision
process, so we have to add those special case lines in front of the
previous directive block.
Proxying exceptions provide another way to influence the proxy
decision process. Suppose that our Apache instance serves a couple of
static PDFs at /productpresentation/; we don’t need or want to relay
those requests to Tomcat. A special form of the ProxyPass directive lets
us specify locations that must not be proxied:

ProxyPass /productpresentation !

Again, this needs to go in front of our other rules.

Cookie Support

Chances are that your application uses cookies to track client
sessions across requests. Cookies include a hostname and a path to let
the client know when to send the cookie back as part of a request. Due
to our path rewrites, however, the cookie paths are no longer valid and
need to be rewritten as well. Your application may act in funny ways,
such as keeping you stuck on one page, if its cookies are not being sent
back, especially if user login tracking is involved.
You can determine the cookies sent by your back-end application in a
variety of ways; browser inspection tools or cookie-specific extensions
let you see them, but a cURL request will do fine as well.
The cookie path is set by the application. Once more we might not be
able to change the back-end application’s behavior, but mod_proxy can be
instructed to rewrite these paths as well so the client will send the
cookie later as desired:

ProxyPassReverseCookiePath /application/ /

The first argument is a match specification for a cookie path that
has to be rewritten. The second argument specifies what the result needs
to look like after rewriting.
You might also have issues with the cookie domain. A similar directive exists to adjust this, called ProxyPassReverseCookieDomain.
Suppose your Apache instance takes requests for virtual host
publicvhost.com, but your backend application thinks its hostname is
backendhostname.local. The syntax would be:

ProxyPassReverseCookieDomain publicvhost.com backendhostname.local

Error Handling

Another common requirement is a unified look among error pages that
doesn’t give away the back-end server software’s name and version. To
accomplish this we use the ProxyErrorOverride directive, telling Apache to let its own error-handling mechanism take over:

Enter mod_rewrite

We used a couple of mod_rewrite rules in the project I was working
on. Those rules tend to be highly project-specific, so we won’t go
through all of them. The most important thing to know is how mod_rewrite
interacts with Tomcat proxying.
From the detailed flow diagram of Apache rewrite processing
you can see that rewrite rules are applied early in the request
handling process. Contrary to the order of proxying or rewrite rules
among themselves, it doesn’t matter where we place the proxying and
rewrite blocks in relation to each other.
Let’s say we have a rewrite map generated by some other process that
needs to be applied. We can implement this using the following rewrite
directive block anywhere in the section where our proxying rules reside:

What we discussed here barely scratches the surface of what you can
do with Apache as a reverse proxy for Tomcat, but we did cover the most
important building blocks to start with, which may save you precious
hours of initial setup. Once you have this setup running, you can spend
more time on the gritty bits of your project-specific setup, such as
load balancing or complex address rewriting and redirection.