A Day in the Life of #ApacheExamples of RewriteMap in Action

Editor's note: Rich Bowen is back with another installment in his ongoing
series based on conversations on #apache. This week, he provides examples
of RewriteMap in action. Rich is a coauthor of O'Reilly's Apache
Cookbook.

#apache is an IRC channel that runs on the irc.freenode.net IRC network.
To join this channel, you need to install an IRC client (XChat, MIRC, and bitchx are
popular clients) and enter the following commands:

/server
irc.freenode.net
/join #apache

Day Twelve

A huge number of the questions on #apache have to do with mod_rewrite. And,
fairly frequently, I find myself thinking that the problem being discussed
would be so much easier to solve if we could just write a Perl script to deal
with it.

Of course, you can, using the RewriteMap, but it's moderately
hard to come by good examples of using this, either in the documentation, or
elsewhere online.

As some of you may know, I'm working on the documentation, and, hopefully,
it will soon contain some good examples of using RewriteMap. But,
until then, this article will serve to provide a simple, as well as a not-so-simple,
example.

I'll go ahead and give the caveat here, since you'd be really irritated with
me if you got to the end and realized this little fact then. Although you can
use a rewrite map anywhere (i.e., including .htaccess files), you
can only define them in your main Apache configuration file. This has to do
with the fact that the map is loaded on server startup, and so putting it in
a .htaccess file wouldn't really work.

We'll start with the most simplistic RewriteMap example, so that
you can see how the syntax works, and how you'd use it in a simple map scenario.
In this simplest form, RewriteMap allows you to create a 1-1 map
between patterns and URLs. You can frequently use it to replace a lengthy list
of RewriteRules with a map file.

We'll start by creating the map file. We'll call it fish.map and
put it in /usr/local/apache/conf, and it will look like this:

In the next step, we create a map name for that file, so that we can use it
in RewriteRules.

RewriteMap fishmap txt:/usr/local/apache/conf/fish.map

And, finally, we'll use it in an actual RewriteRule. In this
case, we want to redirect some requests for various fishes to sites about those
fishes.

RewriteEngine OnRewriteRule ^/fish/(.*) ${fishmap:$1} [R]

Now, when someone visits http://myserver/fish/guppy, they will
be redirected to http://guppies.org/about.html instead.

There's still one small problem, though. If they request the URL http://myserver/fish/salmon,
the rule will be run, the fish will be looked up in the map, and nothing will
match. If we want to provide a default place to go if nothing matches, we can
add that to the RewriteRule:

RewriteRule ^/fish/(.*) ${fishmap:$1|http://no.fish.com/} [R]

Alright, that's pretty simple, you say, but how does this help me if my needs
are more complex than a simple 1-1 mapping? Well, that's where the prg: type
of RewriteMap comes in. Whereas many rewrite rules can be expressed
as a single line of regular expressions, some require several RewriteRule statements
in a row, and others just seem to be more complex than one really wants to
encode in a Apache configuration file. But you could write it in a few lines
of Perl, right?

In fact, in a recent Apache class I taught, one of my students was rather
irate that I left RewriteMap to the end. If I'd told them about that first,
he said, the rest of it would have been unnecessary. I don't know if I'd go
that far, but, let's give a couple small examples to illustrate.

In my first example, I want to replace all dashes (-) with underscores (_)
in a URL. Now, you could do this with standard RewriteRule directives,
using the [N] flag. But that gets icky, and people tend to get it wrong. However,
it's pretty simple in Perl, so let's do it that way instead.

First of all, here's the Perl program that does the transformation. (This
gets fired up when Apache starts, so you're not launching Perl with every request,
or anything silly like that.)

We turn off buffering in the script because, in many cases, having buffered
output can cause the rewriting process to hang indefinitely, waiting for the
output to be returned.

We'll put this script in a file named dash2score.pl and put it
in /usr/local/apache/conf, as we did with the other map, just
for consistency. Make sure to make that script executable. Then we'll give
the map a name:

RewriteMap dash2score prg:/usr/local/apache/conf/dash2score.pl

Now we can use it in a RewriteRule:

RewriteEngine OnRewriteRule (.*-.*) ${dash2score:$1} [PT]

The pattern that I've used--(.*-.*)--will match any requested
URL that contains any dash characters, and will cause the entire URL to be
passed to the conversion script. The script does the conversion in one step,
returns the result, and the RewriteRule passes that resulting
URL back to the URL mapping engine to see what happens next.

The more complex example involves database access. I came up with this example
when trying to persuade WordPress to give
me a particular kind of URL. I should note that, since then, some helpful WordPress
developers have pointed out easier ways to do this. However, the technique
itself was interesting enough that it inspired me to think of doing this article
in the first place. So here it is.

In this case, we're going to look in a database for the information that we
want:

In this case, a URL like http://servername/perm/wooga will cause
a database lookup using the keyword "wooga."

One final word about how this works, and why it's not monstrously inefficient.
The Perl script referred to in the RewriteMap starts when the
Apache server is started, and keeps running for the life of the Apache server
process. This is why you need a while <STDIN> loop, and that's why
it doesn't need to relaunch the program with each request. If the directive were permitted
in .htaccess files, it would mean that the program would need
to be launched with every request. This would be hugely inefficient.

I hope that this little tutorial will help you use RewriteMap for
those cases when the RewriteRules are getting just a little too
hairy.

See you on #apache.

Rich Bowen
is a member of the Apache Software Foundation, working primarily on the documentation for the Apache Web Server. DrBacchus, Rich's handle on IRC, can be found on the web at www.drbacchus.com/journal.