Filtering HTTP Requests with .NET

Introduction

ASP.NET has a number of extensibility points that developers can use. One such
point is response filtering, accessible via the Filter property of the HttpResponse
class. Filters intercept content destined for the client and have an opportunity
to modify that content prior to sending it out. Filters are unique in that
they can access the raw byte stream that is going to be sent to the client.
This article will show you how to create and install a set of simple filters
and will expose some of the gotchas that accompany the technology.

I highly recommend you download the code examples for this article and play
around with them.

Building Blocks

The examples and code for this article work with either the 1.0 or 1.1 versions
of the .NET framework. This
article assumes you are familiar with creating virtual directories and using
custom HttpModules and HttpHandlers inside of
an ASP.NET app. If you're not, it would be a good idea to read up on them before
proceeding or installing the examples. The examples must be rooted in a web
application to work. The easiest way I know of to do so is the use the "Sharing
and Security" option available via right-clicking on the folder where the examples
live. Inside of the "Web Sharing" tab, simply share the folder using some name.
For this article, I'll name the virtual directory ourfirstfilter.

Additionally, the ASP.NET worker process will require write permissions to
the example directory. The samples perform file-based logging to illustrate
how filters and the processing pipeline interact; therefore, the ASP.NET process
has to be able to write out the files. Giving write access to the ASP.NET worker
process is not a fantastic idea; I would highly recommend that you lock down
the virtual directory to authenticated users.

Lastly, a word on scope. ASP.NET response filters will only see content flowing
through the ASP.NET processing pipeline. This may seem obvious, but it catches
a lot of people off guard. By default, static HTML files, static images, PHP,
JSP, or any other technology that does not use the ASP.NET processing model
will not be filtered. If you'd like to filter static HTML content, you could
map the .html and .htm extensions onto the ASP.NET ISAPI extension and configure
ASP.NET appropriately, but this comes at some performance cost. Whether the
cost is too great is something only your situation can decide.

What is Filtering?

So what is filtering and why would you want to use it? Filtering allows you
to intercept the content being written back to the client and do interesting
things with it. Our goal with any web application is to respond to an HTTP
request. This request could be formulated using a number of different methods:
a user with a browser, an automated script using a library, a user at a command
prompt using curl, or any number of other techniques. ASP.NET exposes the request
and response via two objects, HttpRequest
and HttpResponse.
You can find this pair in the System.Web namespace of the System.Web.dll assembly.
The HttpResponse object exposes a Filter
property. That one property is what the rest of this article will discuss.

First, why would you want to filter? Why not just bake your logic into the
application some other way? Truthfully, filtering is hard, and I've only seen
a few really useful things done with it. One is my own HttpCompressionModule.
This is an HttpModule that adds standard compression to any ASP.NET app (and
is fraught with its own perils).

Another case that some people seem to pursue involves string replacement.
With a filter, you could parse the output and replace certain tokens with
something else. For example, you could write a censoring filter that finds and
removes naughty words from a user-supplied comment. This approach can work, but
it too is fraught with peril.

A Basic Filter

A
filter has to derive from System.IO.Stream and
follow a policy of taking ownership of the current filter, storing it,
and writing filtered content into the held original filter. Filters are
write-only streams, so your override of Stream should support writing,
at the very least. The filtering model uses chaining to accomplish its work;
you can easily install multiple filters and have each perform some work
as long as you follow the model. Here's a short example of what I mean,
along with some code to install the filter multiple times:

Installing a Filter

The ASP.NET folks have given us a variety of ways to plug the filter into
the HTTP processing pipeline. Anywhere we can access ASP.NET's Request object,
we can insert a filter. Keep in mind that the filter must be installed before
any content is written back to the client. This means that we must install
the filter before flushing any content to the client. That said, the
three primary places we're going to install a filter are: 1) from within a
custom HttpHandler, 2) from within a custom HttpModule, and 3) from within
an ASP.NET Page object. I'll cover these in reverse order.

Installing a filter from within a Page object is great if you want to scope
the filter to one page on your site. Remember, you have to install the filter
before flushing any content back to the client. The best place to do this with
a Page is inside of OnInit(). The code looks likes this:

HttpModules fit this niche
nicely. An HttpModule can sink appropriate events
from the HttpApplication and install the filter as needed. A common place to
install the filter is in a handler for the BeginRequest event. This ensures
that the filter is installed before any content is written to the client. By
using an HttpModule, you can easily filter content without having change your
content-generating pages. Additionally,
any other requests that flow through the ASP.NET HTTP pipeline will also use
the filter. This includes web services and any custom HttpHandlers you may
be using. Check out the FilteringModule in the downloadable example code for
an example of HttpModule-level filtering. This is probably the most common
way to install a filter.

HttpHandlers have the easiest time installing a filter. You can set it directly
into the Response exposed by the HttpContext passed into the ProcessRequest
method. This gives you the
ability to scope the filter to one handler, instead of an entire site. This
is very similar to Page-based installation, but you don't have to worry about
the Page processing model. Realistically, you're probably not going to do this;
you're already controlling the entire process of responding to an HTTP request
and there are usually simpler methods to accomplish the same result. Of course,
most of the examples in the provided code are written against custom HttpModules.

Considerations

So far, filtering is sounding pretty nice. You can easily
install a filter, it doesn't require any setup inside of IIS, and it can fit
nicely into an xcopy-based deployment. In many ways, filtering inside of ASP.NET
is a great alternative to writing an ISAPI filter. But with any technology,
there are drawbacks. With filters, the drawbacks center on state management,
inefficiency, and a number of strange interactions with other parts of the
ASP.NET framework.

Simply put, writing a good filter can be very hard. Take a look at the Catch22Filter
in the provided sample. It's a pretty simple replacement filter. This filter
looks for words in the output buffer and wraps them in some nice HTML that
makes them appear as censored text in the final output. Hit censor.ashx and
give it a shot. Notice that with the default word list, not only are the words
in the body of the document censored, the words in the title, and possibly
inside of tags, are also censored. To properly filter an HTML document, you
have to keep track of where you are in the
document and only apply the filter at the appropriate time. This may not seem
too hard, but remember two things: 1) that you have to take into account different
text encodings, and 2) looking at each character imposes a performance penalty,
especially in terms of working set.

You may not realize it, but the byte[] being passed
into the Write(byte[] buffer, int offset, int count) override has already
been encoded into the target text encoding. That's right, it's already
UTF-8 with the default ASP.NET install. If you're looking for any kind of token
in the output stream, you have to encode the token using the same encoding
and search for its byte sequence. You
could convert the entire buffer back into a string,
but that can create large amount of garbage and negatively impact performance
and working set. Be sure to profile your memory usage if you take this route.
Additionally, for situations like the Catch22 filter where you're trying to
find given words, you have to fully understand the character comparison rules
for your target language. In the face of internationalization, it can get rather
messy.

Filtering sometimes feels more like a tacked-on feature than
a first-class citizen in the processing pipeline. Filtering
doesn't work if your code calls the End method on the HttpResponse, either
directly or indirectly. When End is called, the HTTP pipeline bypasses the
filter and sends whatever is currently in the real output buffer to the client.
Unfortunately, Server.Transfer calls End as part of its processing.
Therefore, if you make a call to Server.Transfer, your filter is going to be
bypassed, and any work you expected it to perform will be skipped. To prove
the point, hit the transfer.ashxHttpHandler in the provided samples. If you
take a look at the log file, you'll see that a TracingFilter is constructed,
but then nothing further happens. The writes by the transferred-to page are
not passed into the filter, and the document escapes uncensored. A workaround
is to use Server.Execute instead of Server.Transfer. The semantics are a bit
different, as Server.Execute returns execution to the original page after the
called page is executed, but it does work with filters.

The biggest problem with using an HttpModule is figuring
out where to add yourself into the processing pipeline. There are a number
of events you could sink, but
which ones really make sense? For example, what if you want to only filter
HTML content? To do this, you would probably check the ContentType header and
then decide whether or not to install the filter. From looking at the docs,
it would seem to make sense to sink the ReleaseRequestState event,
check the ContentType, and then install the filter if we have a ContentType
we like. The code looks something like this:

This works wonderfully until you call the Flush method on the HttpResponse
object from within your page. When you flush, content is written to the client
before ReleaseRequestState is fired, so the filter is installed after some
content has been written. To fix this, we need to also sink PreSendRequestHeaders(which
should be called PreSendResponseHeaders) and remember who installed the filter.
Here's the fixed code:

Notice that we remember if the filter has been installed by setting an item
in the HttpContext's Items collection. We can't reliably use a member variable,
because one instance of the HttpModule is reused among all requests. We
really need to scope the "installed" memento to the current request, and
the HttpContext is a great way to do that.

Alternatives

If the limited abilities of ASP.NET's HttpResponse filter don't meet your
needs, there are a few other options. If you have your heart set of filtering
the output of the ASP.NET HTTP pipeline, your only real option is to write
an ISAPI filter. The ISAPI filter can do anything you desire and it is completely
decoupled from ASP.NET. An ISAPI filter would work for any content flowing
through IIS, including normal HTML files, PHP, JSP, and any other technology
that can sit inside IIS. The downside is that you have to write an ISAPI filter,
no small task.

Another alternative is to redesign your application to include the functionality
you're trying to provide from within the existing ASP.NET framework. Be sure
you understand all of the extensibility points of the framework before you
latch onto the response filter as a golden hammer. There are often better ways
to include common functionality. Check out user controls, the RegEx classes,
and custom HttpHandlers to make sure you can't accomplish your goal some other
way. This is a bit of a cop out, as you're not really filtering, you're
refactoring.

Ben Lowery
is a developer at FactSet Research Systems, where he works on all things great and small.