I’ve been using Alexa for over 2 years now, and it’s been interesting watching the myriad of different ways that people misuse and misunderstand the numbers. For work, I’ve had to use them in conjunction with Nielsen//NetRatings numbers, comScore numbers, and others.

To start, here’s what Alexa writes about their stats… as an aside, if you have quoted Alexa numbers or sent around charts without knowing the following info, shame on you :)

In addition to the Alexa Crawl, which can tell us what is on the Web,
Alexa utilizes web usage information, which tells us what is being seen
on the web. This information comes from the community of Alexa Toolbar
users. Each member of the community, in addition to getting a useful
tool, is giving back. Simply by using the toolbar each member
contributes valuable information about the web, how it is used, what is
important and what is not. This information is returned to the
community with improved Related Links, Traffic Rankings and more.

The gist of this is, all the information comes from a bunch of toolbars that monitor what users are seeing, and then send it back to Alexa.

So here are a couple issues with Alexa numbers that you might not understand:

Alexa doesn’t give pageviews or uniquesWhen you pull up a particular domain and look at their traffic stats – for example, let’s say Digg.com – it’s easy to think that the number 10,000 somehow means that Digg is reaching 10,000,000 people (or whatever). It’s not. What that means is that for a million toolbars, you have 10,000 people who have seen it in the last day. That’s what "Daily Reach (per million)" means.

The number of toolbars in a million that see a particular stat is pretty meaningless. It’s certainly accurate, but there’s very little meaning that can be assigned here, other than relative meaning. In fact, you can think of Alexa as giving you a pretty weird proxy of the actual stats that you care about. It’s akin to asking "How many people are in the United States?" and answering with "There are 600,000,000 baseball hats in the United States." You’d hope that the two would correlate, but sometimes they don’t.

There are a lot of smart people whose jobs are to know how many
pageviews a particular site has. Advertising media buyers need to know
in order to make sure they are buying ads on sites that are big enough
to even be worth picking up the phone. They don’t use Alexa. (More on
what they DO use later)

Alexa’s numbers are based on a biased sampleDo you run an Alexa toolbar? If you are reading this blog, probably not. Instead, you probably run Firefox, you are more likely to use a Mac, and are probably involved in the advertising or startup world. That’s the particular bias of this blog.

In fact, you might ask, who has an Alexa toolbar? I don’t know the answer to that, but it’s an important one. Those are the people generating stats for the Alexa site. I would guess that pretty unsavvy users typically download the toolbar, for random features like popup blocking or whatever, rather than a general distribution of users. Or more precisely, the toolbar is weighted towards Windows users that actually download the Alexa toolbar and find it useful enough to keep it around. In fact, it probably has a lot to do with whoever Amazon/Alexa has cut deals with in order to distribute their software.

Another issue: Remember that this is all tied to the toolbar, not to the user. So if multiple people use a computer, then you’ll have lower numbers in the "uniques" count. For this reason, you’d expect that certain kinds of sites might be undercounted in the Daily Reach stat, since a bunch of people will only count as one person. For example, students and younger folks might be more likely to share computers. Or heavy travelers, that are more likely to use internet cafes and such. These groups might all be counted as one person, rather than all the collections of folks that use a single browser.

Sound trivial? Well, these are the kinds of precise numbers that media companies have to deal with every day. For example, a lot of Hispanic-targeted sites report lower numbers than usual – the general theory is that some of those niche sites count fewer at-home or at-work users than others. Many of their users are coming through libraries, cafes, and other shared computer environments, and that affects millions of dollars of ad revenue. It’s very hard to correct for behavioral stats like these.

Alexa gives false certaintyWhen one site seems to be significantly outperforming another, you really have no idea how that really compares from an Apples-to-Apples comparison. If you look at it from a pageview-to-pageview comparison basis, there’s all sorts of uncertainty there that isn’t exposed in those precise, one-pixel line charts that Alexa spits out. In any sort of actual statistical treatment, you’d have confidence intervals and would invalidate data from sample sizes that are too small. (The 100,000 ranking mark in Alexa seems a bit arbitrary, don’t you think?)

I’m sure many startups (and others) are losing millions of dollars in valuation based on some VCs looking at their Alexa stats and being disappointed. Or they are being compared to their competitors and it looks like they are being creamed. Either way, they are misinterpretations, because it points to the idea that objectively one site is better than another – and that’s a false certainty. Remember, there’s lies, damned lies, and statistics :)

Alexa is the equivalent of bad exit polling. Perhaps worse, because at least exit polling typically has confidence intervals. Imagine if you compared two small sites on the Internet, and one seems much higher than the other. Would you still conclude this if you knew that the data was under 90% significant? As in, you could only use it on a directional basis? Either way, be smart about the way the numbers are being used, as to not interpret a rank of 2,000 as a "sure thing."

How do we get better data?So one question is, if not Alexa, then where can you get better data? If you look at the media analytics business, dominated by Nielsen and comScore, you’ll see an industry where this kind of data needs to be methodologically sound enough to drive billions of dollars of advertising spend.

Right now, Nielsen has the cleanest reputation for these types of metrics. A year or two back, when we talked to a lot of large New York brand agencies about whether they prefer comScore or Nielsen, everyone said Nielsen. Typically, Nielsen is considered to have better methodology, so that the data you can get from them is more trustworthy. That said, often the data you get from them is sparse, so you go to comScore which gives deeper data.How comScore gets data – still biased, but betterHere’s how it works: comScore has little desktop apps too, which monitor user browsing behavior. They go out and buy lots and lots of remnant advertising to get people to install the things, so they have a massive "panel" of users to figure out what they’re searching on, what websites they are visiting, and so on. They have a pretty huge panel, I believe 500,000 or so users, and they show data on top of that. Note that even though this is a big group, it’s still inherently biased. Who clicks on banner ads and installs random apps onto their computer? They can do some normalization to account for this, which makes it better than Alexa’s approach, but it’s still not great.

The upside to having this huge panel is that they can give lots of detailed data about the web. They can give you a wider index of sites people are visiting, queries they are searching, and so on. Sometimes you don’t care about relative strength, you just care about getting this sort of raw data. For example, a lot of leadgen and advertising companies just want to mine the comScore data for popular queries, so they can buy cheap ads to drive more traffic to their site. Or domainers might be lots of mistyped domains to get AdSense traffic.Nielsen’s old-school approach yields balanced numbers, but less detailNielsen has always tried to have a more balanced approach to their panels. This is a huge part of their brand and reputation. So to start out with, in their most traditional panel, they are actually doing old-school random-digit dial in order to get people on the phone. They are literally generating random phone numbers, talking to the people, and getting them to answer a survey where they recall a bunch of sites. That’s the basis of their @plan service, which drives over a billion dollars of media buying online.

The only problem with this approach is that there is less detail that can be collected. You wouldn’t expect people to be able to recall dozens upon dozens of sites they’ve visited, and you wouldn’t expect them to tell you all their recent search terms. Plus, this might bias towards larger sites where people recall their names better because of brand, even if they don’t actually visit those sites. Self-reporting sucks for lots of different reasons, just ask any usability or market researcher.

In order to address the detail issue, they also have a separate product, called MegaPanel, which also has a desktop app based approach (similar to Alexa and comScore). It’s pretty much the same thing as I mentioned above, and they have reached hundreds of thousands of households in their panel.

Normalizing biased samples to fix dataBoth comScore and Nielsen do some "normalization" in order to fix up the data, since they are getting a biased panel of internet users. This happens by saying that if 80% of the panel users are women, but in real life, the internet population has 50%, then they will discount all the stats from the women part of the panel to get it down to 50%. That way, women won’t over-represent the panel.

They do this across a bunch of different dimensions to normalize their data into something that fits an overall Internet demographic. That way, their data should come out more representative. If you don’t normalize your data, then lots of random groups will be over-represented, and you’ll see a lot of bias. In my experience, I’ve seen a lot of this data bias towards middle-aged women. Who knows why? I’d guess that they are the most likely to download these sorts of data collection programs without knowing any better :)

This normalization works reasonably well, but there are still big problems since estimating 100MM internet users based on 100,000 internet users is still difficult, especially when you are making big adjustments based on little percentages here and there.Behind the scenes, communication happensOf course, because lots of money is at stake with comScore and Nielsen, they also do a lot of work behind the scenes with the publishers and advertisers to make sure everyone is happy. When you have big outliers like the portals, or ESPN, or otherwise, sometimes the data can seem off. Well, a billion here and a billion there, soon enough you’re talking about a lot of pageviews.

So a lot of times, comScore or Nielsen will ask the companies how much they are seeing internally from their ad servers (which are the only REAL source of information) and then adjust their numbers manually based on that. This obviously completely circumvents the panel philosophy, but sometimes it’s needed to reflect what’s really out there.

In the end…Either way, use Alexa numbers for what they are: Rough proxies. Realize that they are a reasonably flawed way to compare things, and especially for lower-ranked sites, are messy and biased. Don’t reach serious conclusions without talking to the sites first to see what stats are coming out of their analytics packages, and seriously consider caveating any speculation. If you’re in the business of making real decisions based on this data, you should probably shell out the $40k/year or so to get better data.