I search the Internet for answers to computer technical problems that I encounter. When the Internet does not supply me with an answer, I research one on my own. Then, I document the answer here so others can find it.
Also, for your benefit, this stupid blog name makes a great mnemonic.

2009-11-08

Apache access log quick & dirty busy report (from awk to Perl).

This is my second awk snippet that I've clumsily rewritten in Perl in my attempt to improve my Perl chops. I'll refactor for more elegant Perl later.

What this script does is spits out a report of the number of requests from Apache access logs (default common LogFormat) broken down by day and hour.

Even my rough draft is a wee bit more elegant than the awk syntax. I did not rely on an external program to performing the sorting, thanks to Perl's sort.

Also, the total data structure is a bit different. Instead of using the full string "06/Nov/2009:09"as an array index like I did in awk, I broke the data down into a hash of hashes. In the Perl version, "06/Nov/2009" was the key to the outer hash, and "09" was the key to the inner hash. This made sorting during output a lot easier.

At first, I tried using an array instead a hash to hold the hours of the day, but this turned out to be problematic. I think the problem had something to do with "09" being treated as an illegal octal digit in the array index. Treating the "09" as a string in the hash key was just easier and more flexible.