Using Amazon Elastic MapReduce and LogAnalyzer application you can generate usage reports containing total traffic volume, object popularity, a break down of traffic by client IPs and edge location. Reports are formatted as tab delimited text files, and delivered to the Amazon S3 bucket that you specify.

Running the Analyzer

To run the application using the console click on the "Create New JobFlow" button, select Sample Applications and choose CloudFront LogAnalyzer (Custom Jar). Click "Continue". In the Jar Arguments textbox replace <yourbucket> with the name of the Amazon S3 bucket in which you would like the generated reports to be placed. Check to make sure that the path doesn't already exist in your S3 bucket, otherwise your job will fail. Click "Continue". Choose the number of instances to be used and then click "Continue". Review your parameters and click "Create Job Flow" to launch the application. After the Job Flow has finished, your reports should be available in the Amazon S3 bucket that you provided.

If you have the Ruby Client already installed then you can generate reports by running

In this command replace <yourbucket> with the name of the Amazon S3 bucket in which you would like the generated reports to be placed. Check to make sure that the path doesn't already exist in your S3 bucket, otherwise your job will fail.

Reports Generated

This sample application produces four sets of reports based on Amazon CloudFront access logs. The Overall Volume Report displays total amount of traffic delivered by CloudFront over the course of whatever period you specify. The Object Popularity Report shows how many times each of your objects are requested. The Client IP report shows the traffic from each different Client IP that made a request for your content. The Edge Location Report shows the total number of traffic delivered through each edge location. Each report measures traffic in three ways: the total number of requests, the total number of bytes transferred, and the number of request broken down by HTTP response code.

Customizing the Application

The LogAnalyzer is implemented using Cascading (http://www.cascading.org) and is an example of how to construct an Amazon Elastic MapReduce application. To customize the reports generated by the LogAnalyzer, download the source code from this page. Follow the instructions in the README for building and uploading to Amazon S3 for use with Amazon Elastic MapReduce.

How to Run this Application

You can run this application using the AWS Management Console or Command Line Tools

Comments

Great, kind of wish it could aggregate referrers

This is quite a good example of what exactly mapreduce is good at (and certainly a gateway drug for people who haven't used elastic map reduce much, like myself).
I do wonder about why this doesn't support a 'top referrers' statistic? Referrers are included in CloudFront logs, but maybe that's a recent addition. Anyway, it would be appreciated, and in the meantime I'll see if I can hack it in.
Thanks for the great code!

If you try to compile this on OS X and get the error "class file has wrong version XX.0, should be 49.0", it's due to trying to compile using an incorrect Java version, in my case 1.5 - LogAnalyzer needs 1.6.
To fix the problem:
1. Make sure the latest version of Java is installed - it's generally installed with Software Update, but can be downloaded from Apple.com.
2. Terminal to /System/Library/Frameworks/JavaVM/Versions and run the command "sudo ln -fhsv 1.6.0 CurrentJDK"
3. Compile using "ant jar"