Apache only logs attempted client requests. If someone connects to port 80
and doesn't send any data, it won't be logged. At a minimum, a client needs to
send some text and one carriage return. Also, the pattern above assumes a
proper request line will be transmitted by the client. If a client only
transmits garbage, the regex will fail.

The timestamps refer to the time the request was started, so it is possible
to see log file entries that are out of order, especially if you are dealing
with a mix of large files and small files being served.

The %T and %D directives refer to the time it
takes to handle the entire transaction, including the amount of time it takes
for the client to transmit data. If you see wide variances in the amount of
time it takes to serve the same file, it may be related to a web client network
problem.

Listed below are a few ideas for potential analysis tools that you can
develop from the Blackbox log.

Performance Graphing

Long-term collection of Blackbox data is better suited to a graphing
environment like RRDTool. You can graph metrics including bytes in and out,
maximum clients per second, and child process lifetime.

To do this, you need to write a program that continually scans the Blackbox
log in short intervals (5 or 10 minutes), grabs all new entries, and then
imports the data into the RRD file.

RRDTool collects time-based sampling data for any duration. One data file
can keep data for durations of a week, a month, or even a year. You can merge
data files into a single graph if you want to report on the performance of a
group of load-balanced servers.

Flight Recorder

You have almost enough data in the Blackbox format to see exactly how a
single client handled a series of requests. You can extract a full HTTP session
by filtering on the remote port and IP.

You could also program a HTTP client to replay the exact same data if you
wanted to try and simulate the client actions. The downside is that you won't
have a record of all of the client headers passed, such as cookies or
authentication data. Plus the timestamp resolution is to the second, which may
not be a 100% match with when the original client transmitted the request.

Final Thoughts

Everyone has been logging web server traffic as long as the web has been
around, but the emphasis has always been aimed toward the content served, not
the server itself. There are tools and modules out there that can monitor
performance, but most of them generate reports of data you can already find
just by logging it.

The Blackbox format is a simple alternative since it doesn't require
additional modules. Further, you can examine the data without a running web
server. All the work to get it up and running requires the addition of Apache
logging directives and an optional patch to the source code.

Chris Josephes
works as a system administrator for Internet Broadcasting.