Writing Modules for mod_perl

Discover the flexibility and power of writing mod_perl modules instead of CGI programs.

CGI programs are a common, time-tested
way to add functionality to a web site. When a user's request is
meant for a CGI program, the web server fires up a separate process
and invokes the program. Anything sent to the STDOUT file
descriptor is sent to the user's browser, and anything sent to
STDERR is filed in the web server's error log.

While CGI has been a useful standard for web programming, it
leaves much to be desired. In particular, the fact that each
invocation of a CGI program requires its own process turns out to
be a large performance bottleneck. It also means that if you use a
language like Perl where the code is compiled upon invocation, your
code will be compiled each time it is invoked.

One way to avoid this sort of problem is by writing your own
web server software. Such a project is a significant undertaking,
though. While the first web server I used consisted of 20 lines of
Perl, most servers must now handle a great many standards and error
conditions, in addition to simple requests for documents.

Apache, a highly configurable open-source HTTP server, makes
it possible to extend its functionality by writing modules. Indeed,
modern versions of Apache depend on modules for most functionality,
not just a few add-ons. When you compile and install Apache for
your computer system, you can choose which modules you wish to
install.

One of these modules is
mod_perl, which places an entire
Perl binary inside your web server. This allows you to modify
Apache's behavior using Perl, rather than C.

Even if you plan to use approximately the same code with
mod_perl as you would with CGI, it is useful to know that mod_perl
has some built-in smarts that caches compiled Perl code. This gives
an extra speed boost, on top of the efficiency gained by avoiding
the creation of a child process in which to run the CGI
program.

Over the last year, this column has looked at some of the
most popular ways of using mod_perl, namely the Apache::Registry
and HTML::Embperl modules. The former allows you to run almost all
CGI programs untouched, while taking advantage of the various speed
advantages built into mod_perl. HTML::Embperl is a template system
that allows us to combine HTML and Perl in a single file.

Both Apache::Registry and HTML::Embperl offer a great deal of
power and allow programmers to take advantage of some of mod_perl's
power and speed. However, using these modules prevents us from
having direct access to Apache's guts, turning it into a program
that can handle our specific needs better than the generic Apache
server.

This month, we will look at how to write modules for
mod_perl. As you will see, writing such modules is more complicated
than writing CGI programs. However, it is not significantly more
complicated and can give you tremendous flexibility and
power.

Keep in mind that while CGI programs can be used, often
without modification, on a variety of web servers, mod_perl works
only with the Apache server. This means that modules written for
mod_perl will work on other Apache servers, which constitute more
than half of the web servers in the world, but not on other types
of servers, be they free or proprietary.

If portability across different servers is a major goal in
your organization, think twice before using mod_perl. But if you
expect to use Apache for the foreseeable future, I strongly suggest
looking into mod_perl. Your programs will run faster and more
efficiently, and you will be able to create applications that would
be difficult or impossible with CGI alone.

Perl*Handlers

CGI programmers have a limited view of HTTP, the hypertext
transfer protocol used for nearly all web communication. Normally,
a server receiving a request from an HTTP client (most often a web
browser) translates the incoming URL into the local file system,
checks to see if the file exists and returns a response code along
with the file's contents or an error message, as appropriate. CGI
programs are invoked only halfway through this process, after the
translation has taken place, the file has been found and a new
process fired off.

mod_perl, by contrast,
allows you to examine and modify each part of the HTTP transaction,
beginning with the client's initial contact through the logging of
the transaction on the server's file system. Each HTTP server
divides an HTTP transaction into a series of stages; Apache has
more than a dozen such stages.

Each stage is known as a “handler” and is given the
opportunity to act on the current stage of the HTTP transaction.
For example, the TransHandler translates URLs into files on the
file system, a LogHandler takes care of logging events to the
access and error logs, and a PerlTypeHandler checks and returns the
MIME type associated with each document. Additional handlers are
called when important events, such as startup, shutdown and restart
occur.

Each of these Apache handlers has a mod_perl counterpart,
known by the collective name of “Perl*Handlers”. As you can guess
from this nickname, each Perl*Handler begins with the word “Perl”
and ends with the word “Handler”.

A generic Perl*Handler, known simply as PerlHandler, is also
available and is quite similar to CGI programs. If you want to
receive a request, perform some calculations and return a result,
use PerlHandler. Indeed, most applications that are visible to the
end user can be done with PerlHandler. The other Perl*Handlers are
more appropriate for changing Apache's behavior from a Perl module,
such as when you want to add a new type of access log, alter the
authorization mechanism, or add some code at startup or
shutdown.

I realize the distinction between Perl*Handlers (meaning all
of the possible handlers available to Perl programmers) and
PerlHandlers (meaning modules that take advantage of Apache's
generic “handler”) can be confusing. Truth be told, confusing the
two isn't that big a deal, since the majority of programs are
written for PerlHandler and not for any of the other
Perl*Handlers.

As I mentioned above, mod_perl caches Perl code, compiles it
once, then runs that compiled code during subsequent invocations.
This means that, in contrast to CGI programs, changes made in our
program will not be reflected immediately on the server. Rather, we
must tell Apache to reload our program in some way. The easiest way
to do this is to send a HUP signal
(killall -1 -v httpd on my Linux
box), but there are other ways as well. Another method is to use
the Apache::StatINC module, which keeps track of
modules' modification dates, loading new versions as
necessary.

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.