2007 Web Analytics Shootout – Interim Report

Introduction to the 2007 Analytics Shoot Out – by Jim Sterne

Every web analytics tool measures clickthroughs and page views a little differently. They’re all using slightly different yardsticks and getting slightly different results. The disparity is driving us to distraction.

Just how different are they?

To get the answer to that, you would have to install some of the most popular tools side by side on the same sites and compare the results. That’s what the team at Stone Temple Consulting did and the results are informative.

Yes, it’s interesting to see which tools, looking at the same data, count higher and which count lower. But the eye opener for these very complex yardsticks is just how many variables there are in the implementation process. Very interesting reading.

Overview of the 2007 Analytics Shoot Out

The 2007 Analytics Shoot Out is targeted at evaluating the performance, accuracy, and capabilities of 7 different analytics packages as implemented across 4 different sites. The goals of the project are as follows:

Evaluate ease of implementation

Evaluate ease of use

Understand the basic capabilities of each package

Solve specific problems on each web site

Discover the unique strengths of each package

Discover the unique weaknesses of each package

Learn about the structural technology elements of each package that affect its capabilities

Learn how to better match a customer’s needs to the right analytics package

How the results of the Shoot Out will be delivered.

The results of the Shoot Out will be delivered in two stages:

This is the interim report, which was officially released at the Emetrics Summit in San Francisco on May 6, 2007.

What you get in this interim report

An analysis of how the user deletion / non acceptance rates of third party cookies and first party cookies differ.

Comparative data showing:

Visitors

Unique Visitors

Page Views

Specific segments as defined per site for 2 sites

An analysis of the comparative data and discussion of the following topics:

What the numbers tell us

Range of results

Does one package always report lower numbers than the others?

Does one package always report higher numbers than the others?

Contents of the Final Report

The final report will contain the same basic types of data as the interim report but with more numerical detail, and a more extensive analysis of that data.

In addition to enhancing the above data from the interim report, the final report will also:

Highlight major strengths of each analytics package, and outline specific scenarios where that will matter to a web site owner / web master.

Highlight specific weaknesses (as well as workarounds and plans by each company to address these weaknesses) of each analytics package, and outline specific scenarios where that will matter to a web site owner / web master.

Methodology

The major aspects of the Shoot Out methodology are as follows:

For each package, except WebTrends, we installed JavaScript on the pages of the participating sites. WebTrends was already installed on one of the sites participating in the project, and the implementation used a combination of JavaScript tags and log file analysis.

All the JavaScript was added to web site pages through include files. As a result, any errors of omission should hold true for each analytics software package.

All packages were run concurrently.

All packages used first party cookies.

A custom analytics plan was tailored for the needs of each site.

Visitors, Unique Visitors, and Page Views were recorded daily for each site.

Content Groups and Segments were setup for each site. Numbers related to these were recorded daily.

Detailed ad hoc analysis was done with each analytics package on each site.

Critical strengths and weaknesses of each package were noted, and reviewed with each vendor for comment.

Each vendor was given an opportunity to present their product’s strongest features and benefits.

Interim Results

The next few sections present the interim results from the Shoot Out. Note that the time frames for the data from each site have been masked, and the time frame used for each site was different, but the numbers are real.

First Party Cookies vs. Third Party Cookies

Using WebSideStory’s HBX Analytics running on CityTownInfo.com, we ran the software for a fixed period of time using third party cookies (TPCs). We then ran the software for the same amount of time using first party cookies (FPCs).

During that same period we ran 3 of the other analytics packages (Clicktracks, Google Analytics, and IndexTools), all using first party cookies.

The results were then compared by examining the relationship of HBX reported volumes to the average of the volumes of the three other packages, and then seeing how that relationship changed when we switched from third party cookies to first party cookies. In theory, this should give us an estimate of how the user deletion / non-acceptance rates of third party cookies compares to user deletion / non-acceptance rates of first party cookies.

Here are the results we obtained while HBX Analytics was running third party cookies:

Visitors

Uniques

Page Views

Clicktracks

72,224

66,335

120,536

Google Analytics

66,866

64,975

118,230

IndexTools

67,365

65,212

123,279

WebSideStory’s HBX Analytics

48,990

47,813

102,534

Average of all but HBX Analytics

68,818

65,507

120,682

HBX Analytics % of Average

71.19%

72.99%

84.96%

Visitor and unique visitor totals for HBX Analytics are 71 – 73% of the average of the other 3 packages. On the other hand, page views are roughly 85% of the average of the other 3 packages.

Now let’s take a look at the same data when HBX Analytics was making use of first party cookies:

Visitors

Uniques

Page Views

Clicktracks

71,076

65,314

114,966

Google Analytics

65,906

64,030

112,436

IndexTools

67,117

64,621

119,049

WebSideStory’s HBX Analytics

55,871

54,520

96,453

Average of all but HBX Analytics

68,033

64,655

115,484

HBX Analytics % of Average

82.12%

84.32%

83.52%

Relative Traffic Growth with FPCs

13.32%

13.44%

With first party cookies, the visitor and unique visitor totals for HBX Analytics are now 82 – 84% of the average of the other 3 packages. The page views relationship did not change significantly, and was roughly 84%.

Analysis and Commentary

By observing how the increase in the traffic reported by HBX Analytics increased with respect to the average of the other 3 packages, we can estimate how third party cookie deletion + non acceptance differs from first party cookie deletion.

According to this data, the third party cookie deletion / non-acceptance rate exceeds the first party cookie deletion / non-acceptance rate by a little more than 13%. WebSideStory also reported to STC that it saw a 15-20% third party cookie deletion / non-acceptance rate across sites that they monitor during a 2 week period in January, and about a 2% first party cookie deletion /non-acceptance rate.

This data is fairly consistent with past industry data that estimates the third party cookie deletion / non-acceptance rate at about 15%.

Note that comScore recently reported more than 30% of cookies are deleted or not accepted overall, and also seemed to show that the difference between TPC and FPC deletions / not acceptances was significantly smaller. What remains to be seen is the methodology they used. Nonetheless, our data above should provide a reasonable indication of how TPC deletions and non-acceptances differ from FPC deletions and non-acceptances.

Cookie deletion aand acceptance rates are of great concern when evaluating web analytics. Every time a cookie is deleted or not accepted it impacts the visitor and unique visitor counts of the tool. In particular, counting of unique visitors is significantly affected. If a user visits a site in the morning, and doesn’t accept or deletes their cookies, and then visits again in the afternoon, this will show up as 2 different daily unique visitors in the totals for that day, when in fact one user made multiple visits, and should be counted only as one unique visitor.

It should be noted that the packages use different methods for setting their cookies. For example, HBX Analytics requires you to setup a CNAME record in your DNS configuration file to remap a sub-domain of your site to one of their servers. This CNAME record maps a sub-domain of your site to the HBX Analytics server.

While this requires someone who is not frightened by configuring CNAME records to do, it does provide some advantages. For example, simple first party cookie implementations still pass data directly back to the servers of the analytics vendor. Memory resident anti-spyware software will intercept and block these communications.

Using the CNAME record bypasses this problem, because all the memory resident anti-spyware software will see is a communication with a sub-domain of your site, and the process of redirecting the data stream to the HBX Analytics server happens at the DNS server.

Unica uses a similar approach that provides a choice of either using a CNAME record based approach for first party cookies or going with a simpler first party cookie implementation.

Other analytics packages used in this test (Clicktracks, Google Analytics, and IndexTools) have chosen an approach to initial configuration which requires no special configuration, and that allows a less technical user to set them up and get started.

Visitors, Unique Visitors, and Page Views (aka “traffic numbers”)

Notes

The Uniques column is the summation of Daily Unique Visitors over a period of time. The resulting total is therefore not an actual unique visitor count for the time period (because some of the visitors may have visited the site multiple times, and have been counted as a Daily Unique Visitor for each visit).

This was done because not all of the packages readily permitted us to obtain Unique Visitor totals over an arbitrary period of time. For example, for some packages, it is not trivial to pull the 12 day period Unique Visitor count.

Regardless, the Uniques data in the tables below remains a meaningful measurement of how the analytics packages compare in calculating Daily Unique Visitors.

The rows that show % information for each analytics package refer to that packages percent of the average result. Note that average does not necessarily mean “correct”. The purpose of this information is intended simply to help identify which analytics package yielded results that differed subtantially from the others.

The time period is not being disclosed to obscure the actual daily traffic numbers of the participating sites. In addition, the time period used for each site differed.

Analysis and Commentary

1. There were significant differences in the traffic numbers revealed by the packages. While we are all conditioned to think that this is a purely mechanical counting process, it is in fact a very complex process.

There are dozens (possibly more) implementation decisions made in putting together an analytics package that affect the method of counting used by each package. The example we provided above of different types of first party cookie implementation is just one example.

Other examples include: whether or not configuration of the package is done primarily in the JavaScript or the UI, and how a unique visitor is defined (e.g. is a daily unique visitor defined as over the past 24 hours, or for a specific calendar day?).

If we look at the standard deviations in the above data, the distribution appears to be pretty normal. Note that for a normal distribution, 68% of scores should be within 1 standard deviation, and 95% of the scores should be within 2 standard deviations. In our data above, this indeed appears to be holding roughly true.

Here is a summary of the raw data:

1.1. While HBX Analytics tended to report the lowest numbers of all the packages, this was certainly not always the case. For example, on AdvancedMD.com, HBX was higher than 2 packages for visitors, and unique visitors.

1.2. Clicktracks reported the highest numbers on AdvancedMD.com. Google Analytics reported the second highest numbers for this site. Google Analytics reported the highest numbers on ToolPartsDirect.com. Clicktracks reported the second highest numbers for this site.

AdvancedMD.com and ToolPartsDirect.com receive a large amount of their traffic from Pay Per Click campaigns. While it’s pure speculation on our part, perhaps this plays into some differences in the way that Clicktracks and Google Analytics count visitor, unique visitor, and page view data as compared to the other packages. We are seeking feedback from Clicktracks and Google Analytics on this point, and hope to provide more information in the Final Report.

1.3. On HomePortfolio.com, WebTrends reported significantly more visitors and unique visitors than the other vendors (about 20% more). This is the only site that we were able to look at WebTrends numbers for at this stage in the project.

Google Analytics reported the second highest numbers on this site.

1.4. On CityTownInfo.com, the highest numbers were reports by IndexTools.

2. As Jim Sterne is fond of saying, if your yardstick measures 39 inches instead of 36 inches, it’s still great to have a measurement tool. The yardstick will still help you measure changes with a great deal of accuracy. So if tomorrow your 39 inch yardstick tells you that you are at 1 yard and 1 inch (i.e., 40 inches), you know you have made some progress.

In evaluating the data presented above, you can see that the analytics packages are all reasonably close to one another. For purposes of evaluating the quality of a yard stick, we can conclude that each of these yard sticks are similar in their measurement quality.

However, referring back to the comScore study, if the first party cookie deletion rate is in fact greater than 30%, this would be of great concern. We will provide some more commentary on this issue in the Final Report.

To put this in perspective, classic marketing vehicles have no direct form of measurement whatsoever. How do TV ads drive sales? How about Radio ads? There is no direct measurement of the return on these marketing expenditures. The web is unique in its ability to provide direct measurement of user behavior to a high degree of accuracy. There is no other marketing vehicle like it.

For example, in web analytics you can conduct A/B testing at a remarkably granular level. Using our yardstick, we can tweak our marketing message in numerous ways, and get direct feedback on how it affects achievement of our business objective. We could, for example, measure the effects of such changes as:

Change marketing copy

Change layout

Change the colors used on landing pages

Change the actual offers made

There really is much more that you can do. But the kicker is that you can get close to real time feedback, and you can rapidly hone your pitch to the customer. On high volume sites, you can get meaningful feedback in less than a day.

Used in this fashion, web analytics software is very accurate. Given enough data, our 39 inch yardstick can easily measure the difference between a 1.5% conversion rate and a 2% conversion rate.

The marketing person placing TV and radio ads might well sell their soul to have that 39 inch yardstick to measure their ad’s effectiveness with a similar level of accuracy.

Content Group Data

1. Here is the form completion and content group page view data for each of the analytics packages and CityTownInfo.com:

Form 1

Form 2

Form 3

Group 1 Views

Group 2 Views

Group 3 Views

Clicktracks

169

567

69

45,646

3,833

9,423

Google Analytics

172

543

59

59,638

4,695

12,255

IndexTools

177

616

68

67,166

4,891

14,461

Unica Affinium NetInsight

172

572

70

60,699

4,713

12,291

WebSideStory HBX Analytics

162

560

69

54,889

4,274

14,763

Average

170

572

67

57,608

4,481

12,639

Clicktracks %

99.18%

99.20%

102.99%

79.24%

85.54%

74.56%

Google Analytics %

100.94%

95.00%

88.06%

103.52%

104.77%

96.96%

IndexTools %

103.87%

107.77%

101.49%

116.59%

109.14%

114.42%

Unica Affinium NetInsight %

100.94%

100.07%

104.48%

105.37%

105.17%

97.25%

WebSideStory HBX Analytics %

95.07%

97.97%

102.99%

95.28%

95.38%

116.81%

2. Here is the content group page view data for each of the analytics packages and HomePortfolio.com:

Group 1 Views

Group 2 Views

Group 3 Views

Group 4 Views

Google Analytics

4,878,899

514,704

448,355

11,823

IndexTools

4,844,642

520,521

457,857

11,540

WebSideStory HBX Analytics

2,222,843

161,922

317,307

10,787

Average

3,982,128

399,049

407,840

11,383

Google Analytics %

122.52%

128.98%

109.93%

103.86%

IndexTools %

121.66%

130.44%

112.26%

101.38%

WebSideStory HBX Analytics %

55.82%

40.58%

77.80%

94.76%

Analysis and commentary

1. Interestingly, this data appears to be more consistent than the traffic data (we discuss the exception of HBX Analytics running on HomePortfolio.com below). This is largely because it is page view based, and page views are inherently easier to track accurately than visitors. This is because the analytics software can’t detect when a user leaves your site.

For example, if a user comes to your site, then leaves to look at some other site, then immediately comes back to a page on your site by typing in the URL directly, or via a bookmark, the analytics software can’t determine that two different visits took place (it could have been simply a page reload request). The JavaScript only runs on the pages of your site, so it does not know that you left the site in between viewing two pages on your site.

In addition, if your visitor arrives on the site, then goes to lunch for an hour, then comes back and clicks on a link on the page they arrived on, the analytics software sees this as a new visit, even though they never left the page.

The reason for this is that the analytics industry has standardized on a definition of sessions that says that 30 minutes of inactivity ends a session (a visit). Some criteria for dealing with these situations needed to be selected, and this is what they chose.

However, it’s easy for the analytics software to determine page views with greater accuracy, because each page view makes a request to the web server of the site, which causes the JavaScript for the analytics software to execute. All the actions being counted take place on the web site.

2. As an exception to this, the HBX Analytics content group data for HomePortfolio is quite a bit lower than that of the other packages. However, we discovered that this is due to an implementation error by our team.

Note that this is not a reflection of the difficulty in implementing HBX Analytics. Instead, it’s a reflection of how important it is to understand exactly what it is that you want the analytics software to do, specifying it accurately, and then double checking that you are measuring what you think you are measuring.

In the case of the problem above, through no fault of the software, we set it up to track people who initially entered at pages in the content group, rather than tracking all the page views for the content group, which is what we wanted.

There is a key lesson in this. Implementation of an analytics package requires substantial forethought and planning. And, when you are done with that, you have to check, and recheck your results, to make sure they make sense. Here is a summary of some of the issues you face in setting up your implementation correctly:

Tagging errors – an error in tagging your pages can really throw you for a loop. You need to do a comprehensive job of setting the software up for success.

Understanding the terminology – each package uses terms in different ways, and it’s important to understand them.

Learning the software, and how it does things – each software package has its own way of doing things.

Learning your requirements – learning your requirements will be a process all by itself. If you are implementing analytics for the first time it may be many months before you truly understand how to use it most effectively on your site.

Learning the requirements of others in your organization – these are not necessarily the same as your personal requirements.

Validating the data – even if you are not running more than one analytics package, you need to have a method of testing the quality of your data and making sure it makes sense. If it doesn’t, then perhaps some of the other steps above were not correctly executed.

One way to reduce many of these risks is to install multiple analytics packages. A substantial difference between the two packages would provide you a visible clue that something went wrong!

City Town Info – City Town Info provides a wealth of information on more than 20,000 U.S. cities and towns. Users come to the site to read and learn about the place where they live, are visiting, or want to move to. Visitors can also use the site’s PlaceMatch tool that allows you to compare and contrast cities and towns, and find cities and towns that are similar to one another.

HomePortfolio – HomePortfolio is a vertical search company focused on home design products and services. Consumers search the best in home design all in one place, find “similar” products using HomePortfolio’s unique system of attribute tags and attribute matching, collaborate visually with their designer or architect using shared Project Portfolios, and discover local retail showrooms where they can touch and feel the products they want before buying them.

Tool Parts Direct – Tool Parts Direct provides a comprehensive catalogue of parts for almost any type of tool, all conveniently accessible on the web. Can’t find it at the local store, or just want to shop from the convenience of your home or office? Look for it on Tool Parts Direct.