Ma startup

Mes livres

vendredi, décembre 10, 2010

Google: More and more Wikipedia, but surfers seem weary

My faithful readers know that I regularly conduct user studies on various search engines, including Google. The latest one contains a slew of interesting elements, one of which grabbed my attention in particular. I have underlined several times the significant place that Google (and other engines...) give to Wikipedia in their results (see here, here [fr] or here).

The latest study shows a level of presence in the encyclopedia never yet reached. It was conducted at the end of November according to a protocol I have explained here. 226 users, all students at the University of Provence, were asked to enter two queries of their choice (in French) in 13 different themes (or 26 queries per user), and to allocate a mark to the first organic link returned by the engine, from 0 (totally dissatisfied with the result) to 5 (totally satisfied with the result). In passing, I would like to thank my colleagues who got their students to do the test.

In total, 5876 queries were able to be analyzed. The presence of the encyclopedia in the first link has reached its highest level since the start of this series of tests, as almost one third of the results lead to Wikipedia on the first link (31.2 % to be exact).

Proportion of Google results in Wikipedia (first link)

Even more surprising, the score given to the results has been dropping gradually since 2008. For the first time, results excluding Wikipedia are marked better than the results in the encyclopedia (3.47 compared to 3.52). These results had reached a peak of satisfaction (up to 4.48 in November 2007). An erosion can also be noted, low but statistically significant, in Google's overall score, which is the lowest in the whole series of tests (3.5 compared to 3.72 at its peak), mainly due to the drop on Wikipedia.

Results score (first link)

The reasons for Wikipedia's presence in the results, at whatever level, are unknown and we can only speculate. I doubt that these fluctuations are simply due to the "PageRank", i.e., roughly speaking, the number of links that web users make to the encyclopedia. For a while now we have known that many other factors are in play in the ranking of results, and I have no doubt that the sites that are most often returned by the engine are subject to very special examination by the teams at Google and very probably ad hoc settings.

One of the hypotheses I put forward is that Wikipedia is a very practical expedient in difficult times. We know that the web is a difficult jungle to control, with intense spamming and SEO practices that have the sole aim of getting around search engines' algorithms, regularly putting them in difficult positions, such as the mad invasion of splogs in the summer of 2005 (see here), or changes to the web itself (see here). It's a fight between the sword and the shield: engines react by making constant adjustments, algorithmic and editorial. Wikipedia is an easy adjustment variable: interviews with users showed that until now the encyclopedia benefitted from a priori favorable credit, even when the page returned did not quite correspond to the query. Thus, for example, the page of a politician or artist was perceived as a relevant result, even if the intention behind the query was for news or to make a purchase (CD, book, etc.). As users mentioned regularly, it's better to end up on a Wikipedia page than on one of those useless forums that are the plague of the web, or worse still a page of spam.

In all evidence, this positive perception is being eroded. Various factors are undoubtedly at work. First, it is likely that web users have become increasingly demanding. As they use search engines (and now other means of accessing information, such as social networks), the public is learning. For example, new generations of students entering university are the first to have had a computer at home since they were born, and Google as a search engine throughout most of their schooling. It is therefore possible that the substitute effect mentioned above is in play less than before, and that on the contrary, it is gradually being replaced by a certain weariness on the part of users with Wikipedia results, that are not always a direct response to their query.

It is also possible that the quality (perceived in any case) of Wikipedia pages is lower on the whole. If more pages are returned by Google, it is logical, statistically speaking, that deeper pages, that are less developed and controlled by the Wikipedia community, finish by appearing. We could also ask whether the constant increase in the number of Wikipedia pages is not itself an almost programmed drop in quality. This is a real question, without controversy (I am not one of those academics that snipes at or turns his nose up at Wikipedia, far from it: I believe it is one of the most fascinating intellectual adventures of the start of this century).

I don't know whether the Google teams are aware of this erosion (I don't know either whether it occurs in other languages). Whatever the reason, it is clearly affecting one of the levers the engine uses to control quality.