Apache Web-Serving with Mac OS X, Part 6

Editor's Note: Following the first five Apache Web-Serving with Mac OS X articles, Kevin Hemenway returns with a "put your legs up" sixth tutorial. This time he walks you through the various Apache modules that come with your Mac OS X installation and shows you what they can do.

It's quiet at GatesMcFarlaneCo. Yes, stereotypically too quiet. The clickety-clack, snickety-snack of typing continues uninterrupted, save for a gurgle here and there from the bubbler down the hall. Lewis and Carroll, normally jibber-jabbering like banshees on a hot summer day, are calm, cool, and unbelievably, collected. What is going on!?

Complacency.

Ever since you showed off Apache under Mac OS X a few short months ago, things have been simmering down -- no longer are emotions flaring or reboots frequent. Frantic service pack and security patches are but a memory. People no longer shake uncontrollably when they make changes to the Web server. Work has become a much more enjoyable watering hole. Soon, there will be no more cat-and-dog debates, and you'll be sharing slide shows of your vacations.

In these dull times, we've got plenty of freedom to fall deeper into the
arms of Apache, tracing our fingers around the features we get free of
charge. As our knowledge expands, we'll mine further into Apache's default
modules, learning the tricks they provide and how to apply them to our
desires.

There are a decent number of modules shipped with Apache, so thus begins a two-part article concerning them, as well as the mojo-jojo that can be yours. We'll keep these articles updated as time goes on, so they'll always be a handy reference, whether you're using Apache on OS X, Linux, or Windows.

Due to Apache's popularity, a large collection of user-created modules exists, all available for your downloading pleasure (some may not work or install correctly under Mac OS X, however). There are modules that enhance Apache's authentication, provide support for new languages, throttle bandwidth, check security, process ecommerce transactions, and much more.

Below, we'll focus on the modules that come with a default installation of Apache on Mac OS X (10.1.4 was used as the basis for this article). If you've been a fervent follower of this series, you know how to enable and disable modules -- you ran through those steps when you enabled PHP. I'll recap what you already know quickly:

If you examine your /etc/httpd/httpd.conf file, you'll notice two large blocks of module related text -- a large LoadModule block, and a similar AddModule block. Any module listed under the LoadModule block has a similar line under AddModule. The reverse holds true as well.

Modules are enabled by two directives: LoadModule and AddModule. If you comment or uncomment (by inserting or removing a # character) a LoadModule line, you've got to do the same for the matching AddModule line and vice versa.

The actual modules are installed under /usr/libexec/apache/ on your hard drive. Module files generally follow the naming scheme of mod, an underscore, and then their purpose or description, like mod_php and mod_mime_magic. There are exceptions to this rule, like libperl for mod_perl.

With those basic thoughts out of the way, let's revisit some familiar faces. As is usual, any fiddling you do with the Apache configuration file will involve stopping and starting the Web server before the changes take effect. As explained in Part 4, you can use most of the directives below in a properly configured .htaccess file, removing the need for an Apache restart.

Modules We've Already Used

As we've progressed through the previous five articles, we've actually been using multiple modules along our way to GatesMcFarlaneCo stardom. To stroll down memory lane, we'll say "hi" to some of our old friends and then move on to our other neighbors. I'll show you the matching LoadModule and AddModule lines, utter some compliments, and then point to the article that used its features.

The first few should be familiar as they're "major" features of Apache -- most of the "little" features we've played with are smaller parts of other modules, and as such, we'll expand our coverage whenever possible.

CGI Scripts

CGI scripts are programs installed on the Web server that perform specific tasks -- such as mailing the results of a form, allowing someone to comment in a guest book, interactive message boards, and so on and so forth. mod_cgi grants us this capability and was covered in Apache Web-Serving with Mac OS X, Part 2. There's not much more this module offers besides what we've discussed previously -- just some logging directives that can be helpful in debugging broken scripts.

Server Side Includes

The Server Side Include module allows us to dynamically insert other files or scripts into our Web pages before they're actually displayed to our visitor. They also allow conditional statements, can perform simple file tests, and more. In Part 2 of our series, we enabled SSI as well as demonstrated how to include the results of a CGI script. Clever use of SSIs can create mini-applications, like this image gallery.

PHP Processing

The following lines load this module:

LoadModule php4_module libexec/httpd/libphp4.so
AddModule mod_php4.c

PHP is a full-fledged programming language that is very popular on Web servers, and often installed by default. It can do everything a CGI script could do, offers fast and easy connectivity to powerful databases, and is well supported by the Web-developer community. We enabled PHP in Part 3 and tested its database connectivity with MySQL in Part 5.

Aliasing Directories

Remember when we were talking about turning on CGIs waaaay back in Part 2? If you do, you may recall that we described ScriptAlias as a "way to map a URL to a location on our hard drive." mod_alias is the magical module that offers this ability, as well as the Redirect and RedirectMatch examples we saw in Part 4.

You can read more about the other capabilities of mod_alias at the Apache Web site, but there's nothing that will surprise you -- just different ways of doing similar tasks. Here's an example of making /Users/aku/Pictures/ accessible as http://127.0.0.1/~aku/pictures/:

Alias /~aku/pictures/ "/Users/aku/Pictures/"

When creating aliases like this, you want to be careful about "permissions." Mac users have never had to deal with permissions before so they can be an interesting thing to muddle through. We won't get into a detailed description here, but in a simplified nutshell:

User directories like Pictures, Library, Music, and so forth are not normally viewable by the Apache Web server -- the permissions are too restrictive.

Simply creating an Alias will probably not work. Sure, you're telling Apache to serve files from that location, but that directory is still protected from other users (one of which being www, which Apache runs as). Again, the permissions are still too restrictive.

In this case, to give Apache permission to access the /Users/aku/Pictures/ directory, we need to say chmod 755 /Users/aku/Pictures in our Terminal. This loosens the permissions, and allows Apache to "read" (but not "write") files from that directory.

Here we're taking every file and directory accessed under /Users/penfold/docs, and instead serving them from /Developer/Documentation. Accessing http://127.0.0.1/~penfold/docs/Carbon/carbon.html would serve /Developer/Documentation/Carbon/carbon.html -- likewise, http://127.0.0.1/~penfold/docs/Carbon/ would get you an index of the entire /Developer/Documentation/Carbon/ directory.

Directory Indexes

The following lines load this module:

LoadModule dir_module libexec/httpd/mod_dir.so
AddModule mod_dir.c

mod_dir controls DirectoryIndex, which we talked about two installments ago. Briefly, it controls what files should be displayed by default when a directory listing is requested. There's nothing more to add here besides the clarification that CGI scripts (index.cgi, for example) can be used as well. Move along, please.

Directory AutoIndexes

mod_autoindex controls the generated directory listings we talked of in Part 4. There's a lot more power behind this module than we've discussed. For instance, you can control the initial sorting order, the descriptions of the files shown, and even include headers or footers (in HTML with optional server side includes, or plain text).

Take the following example. This will add a descriptive element to all our JPEG images and a different description to all our PHP files. When Apache auto generates the index, it'll display our blurbage for each matching file.

There's one problem, however, and that's length. With the look and feel of Apache's auto index, the description is either cut off arbitrarily, or else the browser will scroll the data off screen. A wee voice pops in your head (not a Keanu Reeves sort of voice -- more scary godmotherish): "If only we could add some HTML and make things smaller!"

Ahhh, but we can, young grasshopper, and that's where HeaderName and ReadmeName come in. These directives tell Apache what files to use as the header (controlled by HeaderName) and footer (controlled by ReadmeName) of a directory listing. By default, these files are HEADER.txt (or .html) and README.txt (or .html).

Besides the fact that we've now added our own HTML header that makes the font smaller (via HEADER.html), we've also told Apache not to spit out its normal header code (with SuppressHTMLPreamble). Our descriptions will no longer be truncated since we've given ourselves unlimited length via DescriptionWidth (they may still scroll off the end of the browser window, though).

You may also notice that we've added an underline to one of our descriptions. Including HTML within the AddDescription comes with no restrictions, but you do want to be careful about truncating. If you're not, the HTML code could be cut in half, distorting the rest of your page (above, there's nothing to worry about since we've turned off truncating with DescriptionWidth).

There are many other options available. With a little ingenuity, a user wanting to offer a large collection of downloadable files could have a complete Web site in ten minutes. Think of it -- thousands of MP3s sorted and described, using only two HTML files and some AddDescription lines. Need to add a new song? Just stick it in the directory, add a description, and you're done. No muss, no fuss, and you didn't need any database or programming knowledge.

Of course, you may not like the idea that millions of anonymous Internet users could leech your MP3 collection. With the tips described soon in the "Username-Based Access Control" section, you'll have no speed bumps in your world-conquesting travels (narf!).

Hostname or IP Access Control

The access module controls who can visit your Apache Web server, and we gave a few examples of doing so at the end of Part 3. Past what we've talked about, there's not much more to discuss, except for the following powerful collaboration.

A little bit later, we're going to talk about "environment variables." An environment variable is just a magical term for floating bits of data passed around every time Apache serves a Web page. For instance, when you access this site using a browser, an invisible piece of data named HTTP_USER_AGENT is created. This HTTP_USER_AGENT contains a value, like "Internet Explorer" or "Mozilla" or "Opera" or what have you. Below, I've included some common environment variables and what they could be assigned:

Astute readers may remember running a script called test-cgi back in Part 2, which actually prints out a number of common environment variables. To see a full list created during each and every browser request to your local Apache server, run the printenv script by going to http://127.0.0.1/cgi-bin/printenv.

You're asking why all this matters. Well, with an Apache module called mod_setenvif (which we'll describe more in-depth later), you can create your own environment variables. I hear your shouts of "So!?" I know. Bear with me.

By creating your own environment variables, you can use the Allow and Deny directives from mod_access to restrict visitors based on more than just IPs or hostnames (as we've previously demonstrated). Let me show you an example:

With the above simplicity, we're now denying access to our site from any User-Agent with the name "Email Wolf." This User-Agent, along with many others, is often labeled a "bad robot" as it sniffs around for email addresses to add to spam databases. Here we're detecting whether the access is coming from a known bad bot, and if so, we set an environment variable named shelbyville. Our Deny from env=shelbyville says, "Hey! If there's an environment variable named shelbyville, deny them access!"

Username-Based Access Control

The following lines load this module:

LoadModule auth_module libexec/httpd/mod_auth.so
AddModule mod_auth.c

You'll remember mod_auth from Part 4 of our omnibus, where we chatted about password protecting certain directories. We walked through creating an .htpasswd file, which contained all our usernames, and then we created an .htaccess file like so:

With the above .htaccess file sitting in a directory, we're restricting access to that directory with a password. If any valid-user from the AuthUserFile enters the correct username and password, then we let them in -- everyone else is denied. As in our previous article, if you want to use features like AuthGroupFile or the other require directives, then I'm going to push you rudely to Apache's Web site -- they give a decent tutorial there.

There are two more authentication modules related to mod_auth, and they're normally commented in your Apache configuration file. The relevant lines look like so:

We won't be touching on these here -- they're commented for a reason. Most of the time, if you have special needs for authentication involving different file structures, this is where you'd look. The modulemod_auth_dbm covers storing the password information in a DBM file, whereas modulemod_digest "implements an older version of the MD5 Digest Authentication specification which will probably not work with modern browsers."

There may be some interest in anonymous access control, though. This feature allows you to "authenticate" users, but to do so without knowing who the user is. You can use anonymous control in conjunction with other access control methods (like Allow, Deny, and AuthUserFile).

Why would you want something like this? Perhaps you've got a large amount of documents you don't want indexed by search engines -- you could try a robots.txt file, but some engines don't listen to them. With anonymous access control, you can allow anyone who'll take the time to read your directions.

The actual module lines for anonymous access control are commented in your configuration file, so to follow along with these examples, you'll need to uncomment them and then restart Apache. I also assume you're throwing the examples in an .htaccess file (instructions on how to use .htaccess files are in Part 4). The module lines look like so:

You should recognize the first three directives, as they're typical to what we've seen before (before we gave AuthType a whimsical name -- in this case, we're using it as mini-instructions for the visitor). The Anonymous directive controls what usernames should be considered "anonymous" -- in this case, we've got "orko" and "bender," but we could just as easily have chosen "overtkill," "charizad," and "slimer" too.

The Anonymous_Authoritative controls whether we want to pass unauthorized usernames and passwords off to another authentication scheme for processing. If we say "on," then anonymity is king -- either the visitor logs in with "orko" or "bender" or they're not allowed access.

On the other hand, if we say "off" then we can add in some of what we already know -- authentication via passwords. Take a look at the configuration below. If the user does not log in with "heenie" or "retrogirl" then the username is passed off to the AuthUserFile, where it's also checked against. If it doesn't exist in that file either, then the user is denied.

As is typical, you can be as serene or as complicated as you want. The following configuration will allow any user to get a directory listing. Any user listed in the AuthUserFile can get access to all .jpg files, as well as any anonymous user logging in with "mrs_decepticon" or "spiderj" assuming they enter a valid email address (one with a "@" and "."). Finally, only the "eustace" user can download MP3 files:

You can, of course, get even more convoluted, restricting by IP address or hostname, environment variables, and "oh, the madness!" Just make sure you comment your craziness -- it's certainly easy to get confused as to who has access to what.

Huh? What's That Noise?

Something startles you out of your reverie -- a rather rude employee down the hall slamming doors or doing some other bit of mundania. Shaking your head and murmuring about "15 minutes to get back to the zone," you stand up to stretch your unused limbs of movement. Some ornament on the wall blinks and jumps off a badly painted cliff. You can't believe the time -- not even half the day has gone by.

With your module exploration nearly finished, you figure you'll be done by the end of the workday. Keep watch for "Apache Web-Serving with Mac OS X, Part 7," where we'll finish our spelunking and increase our Web-serving knowledge in places where most haven't delved.

If you're feeling especially adventurous, hunt down the 20 comic book and cartoon characters spread throughout this article. Each character is from a comic book or cartoon I personally read or watch (some easy, like "scary godmotherish" being from Jill Thompson's "Scary Godmother"; some delightfully esoteric). Your task: identify which book or show they come from. The person who emails me the most correct entries will be mentioned in the next article, as well as receiving something random in the mail from yours truly. Good luck!