mkent has asked for the
wisdom of the Perl Monks concerning the following question:

Hey, guys, thanks!!! This is a wonderful resource, and I incorporated some suggestions into the revised script below. I still have some questions, though!

BrowserUk, I decided against using Date:Manip even though I really like that module. That's because the module instructions warn that it's slower than other time modules and this script will be used most often when the web server is overloaded with requests; thus, speed is essential.

Abigail-II, a database would be nice, but the server is producing regular logs, so that's what I have to use.

In the following script, here are my questions:

1) Using strict produces errors that I don't have a global module loaded; what module is that?

2) The simulated $month switch statement doesn't work as expected; instead of values 0 through 11, it gives everything a value of 1. Getting it changed to a number makes timelocal accurate.

3. At the end, I pack all the referrers into an array; what I need to do is count each referrer as an unique URL, so that www.you.com is counted x times and www.me.com is counted y times so I can then tell the top referrer in the time period stipulated by the web page (which just has hours and minutes to enter). That will let me create output like
www.you.com 22
www.me.com 19
etc
How can I count an unknown value and produce this output
And is an array the best way to do it?

Any and all ideas welcome, and thanks in advance. I really appreciate the help!

For the switch, use eq instead of ==. == operator is for comparing numbers, and eq is for comparing strings. Also better use a hash, instead of this switch, that makes your code more neat. Key for the hash would be 'Jan' etc, and value would be 0 etc.

For question 3, instead of array, use hash. The key for the hash is referer, and value is the counter.

In many cases it's often better to make a big regex for the entire line. A fully-expanded line-matching regex can break out all the variables you need without any leftover material that needs to be stripped off with substitutions.

BrowserUk, I decided against using Date:Manip even though I really like that module. That's because the module instructions warn that it's slower than other time modules and this script will be used most often when the web server is overloaded with requests; thus, speed is essential.

Given you are doing a filesystem read between every call to Date::Manip, I seriously doubt it would be a bottleneck. In any case it would be running faster than your broken code:).

If you really find that Date::Manip is too slow for your application, then you could move to using one of the (several dozen) Date::* modules on CPAN rather than attempting to roll your own. Several of these are mentioned in the Date::Manip pod: Date::Parser, Date::Calc. The later of which is an interface to a C library I believe and its reputedly very fast. The main reason I didn't suggest these is that you need to compile them which is sometimes a problem.

A quick look at your code and its glaringly obvious what the problem is. And if you hadn't commented out the use strict and omitted the -w/use warnings it would be glaringly obvious to you too, as the compiler would be screaming the cause at you over and over. 11 times I think.

Come back, if you need to, once you have made your code compile cleanly with -w and use strict and you might get some further help.

I'm really sorry if I offended you. That wasn't my intention at all and I really appreciate the help.
I was reluctant to install Date::Manip, being new to this stuff, but I've done it and now I'm trying to figure out where to specify the log files on my server to test your code.
Again, thanks and I didn't mean to offend you.

Oh, my. I didn't mean to offend Abigail-II or anyone else, for that matter. I admit to being a total newcomer to both Perl and this list and I really appreciate the help being offered. I apologize deeply if I've been offensive!

Right. Lets get down to it. You (and I say you in a friendly, you're-not-the-only-one-everyone-does-this way) have a problem of bad priorities, so, lets list the priorities you have stated so far:

Tight deadline

Performance

Working code

Now, a few of these don't match. Even more don't add up when you add priorities that you *should* have, notably "secure".

So, lets detail some of the issues we have with the code thus far:

Reading in entire log file (into ram no less) every time the script is run

No use of taint mode

No use of warnings

Bad choice of time comparison algorithm

Iffy regexen

Under these circumstances, without access to help, you should be getting your priorities in order. While it depends on your particular position, the following is a good start:

Working code

Working, secure code

Working, secure code by deadline

Working, secure, maintainable code by deadline

Working, secure, maintainable code that performs well, by deadline

What does this mean? Well, the first thing it means is that if a well known, already-debugged module is around that can do any part of the problem you're trying to solve, use it. Performance is waaay down on the list, just use it. If it really turns out too slow in use, *then* you worry about it.

The second thing it means is a quick investigation of language features that can help you out with debugging and security. Taint mode, Strict and Warnings should all be on, and you shouldn't move an inch until your code is working with all of them.

Now, if you had done this, most of the code above, that isn't working, wouldn't even exist. You'd have something a fraction of the size, that worked fine. Maybe it'd be a bit slow, maybe not, but if it was, you'd be posting to perlmonks saying "Hey, here's my working code, how can I make it go a bit faster", and then people like Abigail-II would have been able to tell you how exactly a database would aid your cause, even when the logs are in text file format. But we're not there yet.

This is by no means perfect, I didn't have enough log examples to work from to be sure the regex will work well, but it contains all the properties of a good first effort. All the necessary warnings and taints turned on, the relevant modules used (there is, I think, an apache log parsing module too, but I wasn't sure you were using apache), all user input is verified and no more code than strictly necessary is used.

The end result...no weird month name/number conversions, about 30 seconds of debugging, and a much better product. I highly recommend, prior to your next project, sitting down and listing your priorities as above, it will help focus you on what really matters first, and what can be tweaked second.

Thanks. I've installed Date::Manip and tried to run your code as a test, but I keep getting "Premature end of script headers" errors. I've checked the path on my server but can't figure out why it's generating these errors. Can you tell me why?
Thanks.

It's a problem with the output of the script; which is more than likely quite easy to fix. You print your headers using a call to CGI.pm (see the print header; line?), followed by a blank line, and then the page data (<html> etc.).

You get the "premature end of script headers error" if you didn't output the blank line. That means your problem may be something to do with your print header; line.

When putting a smiley right before a closing parenthesis, do you:

Use two parentheses: (Like this: :) )
Use one parenthesis: (Like this: :)
Reverse direction of the smiley: (Like this: (: )
Use angle/square brackets instead of parentheses
Use C-style commenting to set the smiley off from the closing parenthesis
Make the smiley a dunce: (:>
I disapprove of emoticons
Other