Feb 23, 2009

Analyzing Eclipse Update Site Traffic with AWStats

Knowledge is Power. I don't think it is necessary to explain why having some analysis of your sites' visitors is important. As a software vendor, it is crucial to understand how many people download the software from your site. If you are developing an Eclipse plugin, you probably have an Eclipse Update Site for distributing the software to your customers (whether they are paying or not). The update site is used both for new users, installing your software for the first time, and for existing users updating their software to newer versions. That means tracking downloads is somewhat different than tracking downloads from a regular web site.

The most popular tool for web site traffic analysis is Google Analytics. Like many other tools, Google Analytics works in the following manner: the user views the page along with a tracking JavaScript on the page (also known as Page Tagging). The JavaScript, upon loading the page, sends the information to a main server. This works great for web sites but it is completely useless for Eclipse Update Sites. The transactions of the update sites are not done using a full web browser, so the JavaScript cannot execute.

The simple solution in this case is to take a different approach to tracking visitors: analyzing the server logs. The web server can keep logs of all the requests going through the server. These logs can be analyzed using a log analysis tool. The most popular tool for that is AWStats.

AWStats is a free, open source tool (under GPL license) for analyzing server logs. It can be set up to use a web interface for presenting reports on site visitors. Setting up AWStats is a fairly simple task. There are many tutorials and articles on the subject, so I won't be covering this part. So, before you continue to the next paragraph make sure you have AWStats up and running.

Configuring AWStats for Update Site Analytics

In its' default configuration, AWStats will not present correct Update Site statistics. The main reason for that is the browser identity. When Eclipse connects to your site, it will identify itself as a "Java" agent, not as a browser. This causes AWStats to assume your client is a bot or a worm and ignore its' traffic. Another reason is filtering out XML files from the analysis. This is done by default. However, the update site is mostly JARs and XMLs, so that's really missing its' purpose.

To change the settings, open your AWStats configuration file (there's a separate file for each site you analyze) and make the following modifications:

Look for a setting called LevelForRobotsDetection, the default is 2, change it to 0. This will ensure the requests from the Java clients will not be detected as robots.

Look for a setting called NotPageList. You will find the list of ignored file types. Remove XML from that list. This will log the access to files like site.xml and content.xml

Getting the Most from the AWStats Report

The basic report of AWStats is useful enough. It is in the monthly level, but you can break it down to the daily or hourly level using the databasebreak=hour or databasebreak=day option. For example, if you are running AWStats directly from your browser (as opposed to generating the reports from the command line) you will need to add the following to your URL: &databasebreak=day&day=23 (where 23 is the day number, no way to select it other than write it in the URL). The report is the same as the monthly report, except that the data shown belongs to a specific day. This option is still "experimental", but it seems to work just fine. You can read more about it in the AWStats FAQ.

I found two sub-reports particularly useful:

The "Unknown OS" report will show the Java versions being used by your users. Unfortunately, this is not a summary report, just a report of the last X visits.

The full list of URLs will show all the JARs that were downloaded.

To conclude, I should point out that I'm not an expert in web analytics. I just needed a simple solution and AWStats seems to do the job. It is not perfect, but it is OK. I will be happy to learn about other tools you might be using for that purpose. I will continue to update on my AWStats experience and any other solutions I might encounter for that purpose.

Comments

Analyzing Eclipse Update Site Traffic with AWStats

Knowledge is Power. I don't think it is necessary to explain why having some analysis of your sites' visitors is important. As a software vendor, it is crucial to understand how many people download the software from your site. If you are developing an Eclipse plugin, you probably have an Eclipse Update Site for distributing the software to your customers (whether they are paying or not). The update site is used both for new users, installing your software for the first time, and for existing users updating their software to newer versions. That means tracking downloads is somewhat different than tracking downloads from a regular web site.

The most popular tool for web site traffic analysis is Google Analytics. Like many other tools, Google Analytics works in the following manner: the user views the page along with a tracking JavaScript on the page (also known as Page Tagging). The JavaScript, upon loading the page, sends the information to a main server. This works great for web sites but it is completely useless for Eclipse Update Sites. The transactions of the update sites are not done using a full web browser, so the JavaScript cannot execute.

The simple solution in this case is to take a different approach to tracking visitors: analyzing the server logs. The web server can keep logs of all the requests going through the server. These logs can be analyzed using a log analysis tool. The most popular tool for that is AWStats.

AWStats is a free, open source tool (under GPL license) for analyzing server logs. It can be set up to use a web interface for presenting reports on site visitors. Setting up AWStats is a fairly simple task. There are many tutorials and articles on the subject, so I won't be covering this part. So, before you continue to the next paragraph make sure you have AWStats up and running.

Configuring AWStats for Update Site Analytics

In its' default configuration, AWStats will not present correct Update Site statistics. The main reason for that is the browser identity. When Eclipse connects to your site, it will identify itself as a "Java" agent, not as a browser. This causes AWStats to assume your client is a bot or a worm and ignore its' traffic. Another reason is filtering out XML files from the analysis. This is done by default. However, the update site is mostly JARs and XMLs, so that's really missing its' purpose.

To change the settings, open your AWStats configuration file (there's a separate file for each site you analyze) and make the following modifications:

Look for a setting called LevelForRobotsDetection, the default is 2, change it to 0. This will ensure the requests from the Java clients will not be detected as robots.

Look for a setting called NotPageList. You will find the list of ignored file types. Remove XML from that list. This will log the access to files like site.xml and content.xml

Getting the Most from the AWStats Report

The basic report of AWStats is useful enough. It is in the monthly level, but you can break it down to the daily or hourly level using the databasebreak=hour or databasebreak=day option. For example, if you are running AWStats directly from your browser (as opposed to generating the reports from the command line) you will need to add the following to your URL: &databasebreak=day&day=23 (where 23 is the day number, no way to select it other than write it in the URL). The report is the same as the monthly report, except that the data shown belongs to a specific day. This option is still "experimental", but it seems to work just fine. You can read more about it in the AWStats FAQ.

I found two sub-reports particularly useful:

The "Unknown OS" report will show the Java versions being used by your users. Unfortunately, this is not a summary report, just a report of the last X visits.

The full list of URLs will show all the JARs that were downloaded.

To conclude, I should point out that I'm not an expert in web analytics. I just needed a simple solution and AWStats seems to do the job. It is not perfect, but it is OK. I will be happy to learn about other tools you might be using for that purpose. I will continue to update on my AWStats experience and any other solutions I might encounter for that purpose.