Ma startup

Mes livres

dimanche, mars 05, 2006

Search: And the winner is...

Yes, that’s the result of the evaluation I carried out in December 2005 with my students in Aix, some aspects of which I have already revealed on this blog (see [fr] 1, 2,3, 4, 5). The last part of the study (undoubtedly the most interesting) concerns the ranking of the different search engines according to relevance – or at least relevance as perceived by a panel of users. Let me first recap the methodology used. The full study is available here in pdf format if you want more detail.

Three American search engines were chosen for this study, Google, Yahoo and MSN, along with three French ones, Exalead, Voilà (developed by France Telecom and available on the Wanadoo web portal) and Dir.com from the Iliad group, which is more of an experimental platform than a commercial search engine (Dir.com has just put a new, improved version online, but unfortunately it was not taken into account in the study). Other search engines, such as MozDex or AskJeeves, were not considered because they do not have a French language version (or only a Beta version in the case of AskJeeves).

Fourteen topics were selected in order to reflect a broad variety of uses (Animals, Cinema, Current Affairs, etc). Each topic was attributed to a different student, who was then free to choose five search queries. The format (with or without quotation marks, one word or several words) was also entirely up to the student. The study was intended to be a “blind test”, with users not knowing which search engines provided the results, so I entered the 70 search terms into the six search engines myself. The first page of 10 results not marked as sponsored was archived for each request and each engine (4200 results in total), then automatically stripped of all information other than the resulting URLs.

The search-URL pairs corresponding to each topic were then given to the student concerned, who had to evaluate the document indicated by the URL (see the detailed study), and in particular to grade its relevance on a scale of 0 to 5, with 0 being a document which was totally useless or off-topic, and 5 being a document that provided a perfect answer to the question posed.

The ranking is as follows:

Google and Yahoo tied for first place, with a rating of 2.3, but the most striking result is undoubtedly the extremely low level of user satisfaction. None of the search engines even passed (2.5 out of 5) and some of the grades were extremely low (1.2 for Voilà). Links graded as 0 (totally useless) were astonishingly numerous: 53.1% for Voilà, but even the best didn’t do much better: 28.6% for Google and 27.7% for Yahoo. Conversely, results graded as 5 (excellent) failed to reach even 16% for the top two search engines.

Even if we restrict the study to the first link on the results page (the link users click most often), the performance is little better: Google and Yahoo barely scrape a pass mark with 2.9 and 2.8 respectively. Curiously, the performance of Voilà is even worse on the first non-sponsored link, since its grade falls to just 0.5.

I pointed out in a previous post [fr] that the proportion of commercial links (not marked as sponsored) is high, varying between 7 and 16% depending on the search engine. In itself, the presence of commercial links does not necessarily have a negative impact on quality: for a search like “Harry Potter”, returning the page on Amazon where the book can be purchased may be relevant. However, as things stand, we can see a clear degradation of the results in terms of perceived relevance for commercial links, for all search engines: the grade given to commercial links is systematically lower than that given to other results. Google and Yahoo lose around one point for each commercial link, which is a lot on a 5-point scale, especially when the highest grade obtained is only 2.3.

I am sure that this study will provoke a few reactions. In any case, it seems to me that two conclusions can be drawn. Firstly, the good grades that some search engines give themselves are unjustified: clearly a great deal of research needs to be carried out in order to improve user satisfaction. We often forget that the underlying technologies are still young, and still taking their baby steps. Secondly, there is nothing in this study to explain why web surfers greatly prefer the Google search engine, since overall the performance of Google and Yahoo is more or less equivalent, and ahead of their competitors. We must therefore suppose that the reasons go beyond the criteria of relevance of results.

4 Commentaires:

Anonyme a écrit...

Interesting study.I wouldn't be too disappointed, though, since for many queries there are seldom more than one or two "perfect" results, so averaging over ten would obviously degrade the final score.An interesting follow-up that could be done is a measurement of coverage for these search engines.First experiment: take the union of the "good" results for each query (say, those scoring 3.0 or above) and compare the result set of each search engine with this better set.If, for example, there are only 4 relevant results for a given query and one search engine shows only three of them, this would give it a score of 75%.An ever more interesting, but more expensive test, would be finding the "perfect result set" for each query and compare each search-engine result set with that one.One way to do it is asking your students to find the best web pages answering their original query (either refining the queries or searching on more specific sites etc.).

One problem with non-English searches is in fact that for some topics there are no good pages available on the net, to begin with.