Writing Apache 2.0 Output Filters

09/13/2001

In the last article, we discussed the basics of Apache 2.0 filters and there was enough information to get started, but not enough to write a functioning filter. In this article, we will finish discussing output filters. After reading this article, you should be able to write your own Apache filters.

When the filter interface was first designed, it was written for the Apache
developers and very little attention was paid to making it easy for other
people to write their own filters. Since that time, the developers have
looked at the API again and have added a simple layer on top of the
original interface. The original API required that the filter writer take
into account how much data they were passing to the next filter to take
full advantage of the filters. For example, if you were writing a filter
that swapped every other word, you had two choices -- either convert the
file to a series of buckets and move the pointers in the bucket list around, or
copy each word into a large block of memory.

Each of these approaches has
problems that must be solved, but with the new API this problem becomes
simple. All the developer must do is split the file into individual words
and write the words to the next filter. Apache itself will take care of
copying the data when it should, and it will take care of passing the data
to the next filter when appropriate. This module
implements this swap filter.

The new API very closely resembles the buffered file I/O API from POSIX.
In fact, the developers used that API as the model when redesigning filters.
There are five functions used to send data from the current filter to the
next. All five functions share the same first two parameters. The filter
passed to each function is the next filter in the filter chain, and the bucket brigade used to store the data, if necessary. All of
the functions also share the same characteristics. Data is copied into the
last bucket in the brigade until the brigade has more than 8K of data.
Once 8K is reached, the entire brigade is sent to the next filter in
the chain.

ap_fwrite(ap_filter_t *f, apr_bucket_brigade *bb, const char *data, apr_size_t nbyte)
Write a specified number of characters from the data string to the next filter. The variable nbytes specifies how many characters should be written.

In addition to those five standard functions, there is one more that is
important. Because these functions buffer data until there is enough to send,
it is vital that filter writers be able to dictate that the data must be sent
immediately. This is done using the final function, ap_fflush(ap_filter_t *f, apr_bucket_brigade *bb). This function just takes the current brigade and sends it to the next filter.

Now that you know how to send data, there is only one more thing that you
must know before you can write your own filter. Apache filters are called as
many times as necessary to process all of the data produced by the handler.
This means that it is possible and even likely that at some point your
filter will begin to process data and find that it doesn't have enough to
finish processing. In some cases, you can just save a state and return
to the previous filter to wait for more information. However, more often you
will need to save some of the data that you have already parsed for the next
time your filter is called. This is done using:

This function accepts the current filter pointer as the first argument. The
second function is the bucket brigade used to save the data. If the save_to
brigade is "null" it will be created inside the save_brigade function. The third
parameter is the current brigade. This brigade should contain the data that
you want to save for the next time the filter is called. Finally, this
function accepts a pool which is used to allocate any required data. When
this function returns, the save_to brigade contains a complete copy of all
the data to be sent to the next filter. This brigade can then be saved to
the filter's ctx pointer for use the next time the filter is called.

When writing filters, it is important to realize that filters are called as
often as necessary to process all the data. This is a good thing because
it allows Apache to stream information to the clients as soon as it is
available. However, this also has some drawbacks for filter writers.

It is
important to realize that there are some things that can be done the first
time a filter is called that can never be done again. For example, the
first time a filter is called, it is possible to modify the headers associated
with a response. It is also possible to add a special error bucket to the
brigade. If an error bucket is added to the brigade, then one of Apache's
core filters will find the bucket and issue an error response instead of
sending the data that has been generated.

Between the description of the filter functions in this article, and the
example output filter linked to above, you should have enough information
to be able to write your own Apache filters. Many things can be done with Apache filters and I encourage everybody to experiment with
the new filtering abilities in Apache 2.0. In the next article in this series, we will explore Apache
2.0 input filters. Input and output filters share some characteristics, but
they are different enough that spending one article specifically on input
filters is a valuable exercise.

Finally, I want to catch everybody up on the current status of Apache 2.0. We
are still tracking down a show-stopper problem with the threaded MPM that
could take down the web server in edge cases. A beginning patch has been
applied, and as soon as it is completed the Apache developers will tag and roll another distribution of Apache 2.0.
This distribution should become the next beta release of Apache.

Ryan Bloom
is a member of the Apache Software Foundation, and the Vice President
of the Apache Portable Run-time project.