Filtering I/O in Apache 2.0

One of the holy grails of the Apache
developers has always been filtered or
layered
I/O, the ability for one module to modify the
data that was generated by
an earlier module.
This ability was originally slated for inclusion
in Apache
2.0, but when work began in earnest on
2.0, this feature was pushed aside, and
marked
for inclusion in 2.1 or 3.0. Two months ago
however, the Apache
developers had a small
meeting, and designed filtered I/O for Apache
2.0.
The work has been started, and there have
been some filters written. Over
the next few
months, I will explain how this feature works,
and how your
modules can take advantage of it.
One of the holy grails of the Apache developers has always been filtered or layered I/O, the ability for one module to modify the data that was generated by an
earlier module.

The general premise of the filtered I/O
design in Apache 2.0 is that all data
served by
a web server can be broken into chunks. Each
chunk of data comes
from the same place either
a file, a CGI program, or it is generated by a
module.
We also knew that all of the data could
always be represented as a string of
characters, although that string may not be
human-readable. Armed with that
knowledge, we
sat down to design the filtering system. One of
our overriding
goals, was that the filtering
logic needed to be performance aware. It didn't
matter if filters chose to ignore performance
issues, but it did matter if the
design hindered
filters from knowing about performance issues.
This meant that
we needed to know more about
that data than just what the data was, we also
needed
to know where the data comes from, and
what it's lifetime is.

This meta-data is
important when actually writing the response to
the network.
For example, if we have a very
simple request that is just a page from disk,
then we want to use sendfile (sendfile is
provided by APR, and is available on all
platforms, it the platform doesn't have a native
sendfile, then APR loops reading
the file and
writing to the network.) If we take this example
a step further, and
make the whole response an
SSI page, where one element is a file from disk,
and
the rest is generated, such as date
strings, then we want to use a single sendfile
call if possible. APR's sendfile provides an
opportunity to include both header
and trailer
information with the file, which are sent using
writev. In this
example, we can send the HTTP
headers, the full file, and the date string with
one APR call (The number of system calls will
differ depending on platform).
Keeping the
meta-data accessable is obviously a good idea.

In order to keep the meta-data available,
the Apache developers needed to find
some way to
pass everything from one filter to the next. The
data structure that
was designed to do this is
being called a bucket_brigade. Each bucket
brigade is
composed of multiple buckets. The
buckets contain the data that we are sending
to
the client. And the type of bucket used makes up
the meta-data.

Currently, we have a small
number of bucket types, but the bucket API was
designed
to be extendable. The current bucket
types are:

AP_BUCKET_HEAP

This bucket type is designed to store data
allocated off the heap. This data
will be
available as long as the bucket is available. If
the data needs to be
modified and there is space
in the bucket, it is acceptable to modify the
data
in place when using this bucket.

AP_BUCKET_TRANSIENT

This bucket
stores data allocated off the stack. This means
that when a filter
function returns, it is
garaunteed that the data will not still be valid
the
next time this function is called. If the
data has not yet been written to the
network,
then it must be converted to a heap bucket so
that it is still available
the next time the
current filter is called.