Introduction

Inspired by the crapness of
Netscape
(specifically my inability to prevent it from not using the
colours and fonts that I want it to)
and HTML authors' insistence on making all the text on their pages
far too small,
and the irritation caused by animated GIFs and blinking text,
I decided to write an HTTP proxy that would remove these problems
and therefore make the web slightly less of a steaming pile of
foetid dingo's turds.
Thanks to the wonders of Perl
and libwww-perl,
I managed to write a fairly functional program in about a day.
Some additional hacking has occurred since then...

Installation

You need to install recent versions of
Perl and
libwww-perl.
There's a slight bug in perl 5.004_04 which confuses LWP's
handling of POST requests. Here is a very little patch to
LWP::Protocol::http.pm that works around it.
More recent versions of LWP and/or perl don't have this problem.

Then get the source and put it somewhere
useful. You'll probably want to edit the configuration a bit; it
should be moderately obvious which variables to change in which
way. If not, look at the libwww-perl
manpages. Particular things that you can configure are:

which HTML tags to strip
(e.g. <blink>,
<font>)

which HTML attributes to strip (e.g. bgcolor)

which HTTP headers to strip or add (e.g. Cookie:)

which requests to forbid (e.g. getting advertisements)

how to treat frames and images

which port & IP address to listen on

whether to work as a proxy or an httpd

whether to connect to other places via a parent proxy

how much logging to do

which hosts may use htmlf

You probably want to run htmlf with something like
`cd ~/etc/htmlf && htmlf`.
Then configure your browser to use a proxy on
localhost:8080 (or whatever you configured
htmlf to use). You should now be able to browse more
comfortably.

The program runs as a daemon that forks for each connection,
and feeds data from the server to the client as it arives. This
means that it shouldn't slow down browsing too much (although it
does slow things down quite a lot). Given the overheads of Perl,
it's probably best to use htmlf on an individual or
small-group basis.

Planned Improvements

As you can tell from the version number, htmlf is fairly
complete now -- I've fixed animated GIFs! -- but there are a couple
of featres that I might also add:

Bugs

The current libwww-perl doesn't do persistent
connections. I'm also not sure how it deals with byte ranges: my
current code discards Content-Length: headers to avoid
confusing clients; I don't know how badly it gets byte ranges
wrong.