One of the major online audience measurement companies, comScore, Inc., has generously donated access to its Media Metrix and World Metrix data sets to the Foundation. comScore has an opt-in panel of two million internet users around the globe and uses a range of statistical techniques to create an internally consistent portrait of the global internet audience. As an example, below is a chart summarizing comScore's estimated audience over the past few years for wikipedia.org, both in the U.S. and worldwide.

comScore estimates that, during the month of January 2010, 365 million unique visitors (UVs) viewed our projects from a personal computer, which it estimates was a "reach" of 29.5% of the 1.24 billion worldwide PC-based web browser audience:

comScore's panelists report age and sex so it can generate detailed demographic estimates, including raw data and also an index which measures the extent to which a set of visitors to our sites is over or under-represented compared to visitors to all sites on the internet. For January, comScore estimates our 365 million audience is made up of 199 million men (29.7% of men online) and 166 million women (29.2% of women online). We index slightly higher with men (101) than with women (99). Here's a breakdown of different age groups:

Worldwide unique visitors

Reach in
age group

Index

Ages 15-24

98.6 million

29.2%

99

Ages 25-34

85.3 million

26.4%

89

Ages 35-44

76.5 million

28.7%

97

Ages 45-54

58.4 million

33.4%

113

Ages 55+

46.0 million

34.0%

115

We index highest for older users (ages 45-54 and 55+) and lowest for those 25-34 years old. I dug into this issue, at first thinking it was driven by twenty-something preference for YouTube and Facebook. As far as I can tell, though, our comparatively weak performance in the 25-34 year old demographic is the result of our weakness in China where comScore believes there is a huge audience in that age range. For example, the large China-based sites like Tencent, Baidu, SINA and Alibaba all index globally around 120 for this demographic group.

comScore estimates the unique visitors to our sites from home and office users in China (excluding Taiwan and Hong Kong). In July of 2008, comScore estimated 232,000 UVs to our sites in China. In August, the month of the 2008 Beijing Olympics, comScore estimates we had 1.3 million visitors. By March of 2009, the audience estimate was 2.75 million. In January, comScore estimates 3.4 million visitors, comprised of 2.5 million UVs to one of the Chinese language wikipedias and 1.0 million to the English Wikipedia. By contrast, comScore estimates the Baidu Encyclopedia had 47 million visitors from within China. Given that comScore does not track internet usage from public locations (e.g. internet cafes), these estimates certainly undercount overall activity from China.

In India, comScore estimates 10.1 million unique visitors came to our sites, or 27% of internet users in India. Of these, 9.9 million visited the English Wikipedia while 320,000 visited one of the different Indian language wikipedias.

comScore also provides analysis of the site a user surfs just prior to visiting us. The percentage of these "entries" from Google and other search engines is often used as an indicator of reliance on the search engines for traffic. Other major sites like YouTube, eBay or Facebook typically see entries from Google at 10% to 15% of their traffic while we are typically over 50%. Here's a breakdown for us for December (this data is published later than other information so is a month behind):

The table below shows the calculations for the biggest few Wikipedias for December, the latest month with available editor counts. On the English Wikipedia only about .02% of the unique visitors actively edit. Put another way, that's one-fifth of one-tenth of one percent. If you include all logged-in users who made at least one edit, it's about fifteen times higher at one-third of one percent. Including anonymous editors would result in even higher participation rates but to date Erik has not been able to analyze anonymous editing.

Jay Walsh on the Foundation's staff is managing the comScore relationship overall, Erik Zachte is helping drive the statistical analysis, and a volunteer named Josh Holman has a lot of experience with comScore data. Feel free to reach out to me or any of them with questions. If there's interest, we'll try to update this page every month or two as new data comes out.

Finally a quick thank you to comScore. The data they donated typically sells for thousands and thousands of dollars, so we're lucky to be able to review. Speaking on behalf of all of us in the community, I want to thank them for their support.

comScore has a large and professional team dedicated to audience measurement. We are able to benefit from their insights with no coding, no servers, no hard drives stuffed full of log data, and almost no effort.

comScore reports "unique visitors", which estimates the actual people using the internet. This puts things into more human terms than page views or an ambiguous "traffic rank" metric. Also, comScore works hard to exclude bots, crawlers, mirrors, click farms, etc.

comScore works to combine different domains and subdomains. This is particularly useful for international properties. For example, we are able to generate a single audience number for all five or so Chinese language Wikipedias and compare that, both worldwide and within China, to the audience using the English Wikipedia.

Because comScore does its analysis consistently for all websites, with the same statistical techniques and methodology, we can compare among our projects and to others.

With a panel of two million users globally, comScore has strong international coverage.

comScore panelists provide demographic data so we can see estimates of factors like age and sex.

Coverage of educational users -- comScore focuses on users 15 years old and older using the internet at home or work. Globally, it does not have coverage in schools (though it does have coverage in universities in a few countries). Given our strengths in education this will inevitably lead to significant underreporting of school use and thus our overall audience.

Coverage of worldwide usage -- comScore recruits a panel of users across the world, but their coverage can't be perfect. Given our strong international presence, this will likely also lead to some misreporting of our audience. Also, the dynamics of their panel make analysis less and less valuable the deeper you drill down. Statistics for a specific smaller countries (e.g. Egypt) are typically not available or if they are might be less useful depending on the size and make-up of comScore's panel there.

Coverage outside home/work -- comScore does not measure people who go online from an internet café or other public/shared computers. This means their audience estimates in certain parts of the world will be significantly underreported. This will have a major impact of underreporting total audience in countries with strong public/shared internet usage. Whether this has a big impact in percentage reach numbers depends on differences in home/work usage and public/shared usage (which might be meaningful in some countries where governments are believed to trace people's internet usage).

Coverage of the mobile audience -- This data set of comScore's is of the PC-based internet audience, so excludes access through mobile phones. A July 2008 Nielsen research report estimated there were about 40 million mobile web users in the U.S. alone, and most industry observers expect this number to grow rapidly. This is likely another source of underreporting of the total audience we reach.

comScore offers a sophisticated ability to combine domains and subdomains to better understand the audience for and performance of our projects. We've worked extensively with them clean up their definition. We want to be inclusive and careful in defining the different Media titles ([M]), Channels ([C]), and Subchannels ([S]) so we can see what's happening with the different projects. Also, we don't need to be exhaustive and capture every single domain name. A domain which automatically redirects to one of our other sites would end up being counted after the redirect. If you see other changes we should request, start a thread on the Talk page.

Below is comScore's definition as of January 2010, which includes some still experimental efforts to identify edit pages: