Web CLIR: an exploratory study of Google's new tool

Haidar Moukdad

Summary

Google introduced last year a new tool to its growing inventory of language-specific applications. This tool allows users to enter searches in a language of their choice and retrieve documents written in another language, making Google the first major search engine to provide users with access to cross-language information retrieval (CLIR) on the Web. While this phenomenon is still in its infancy on the Web, it has the potential to revolutionize the way people search for and retrieve Web documents. In addition to providing CLIR capabilities, Google has integrated its translation engine with the new tool to provide users with parallel translations of retrieved documents: in the query language and in the language of the retrieved documents. For example, if a user enters a query in English against Arabic documents a set of Arabic documents satisfying the query will be retrieved and English translations of these documents will be provided.

The proposed poster reports on experiments conducted using Google CLIR capabilities to explore the performance of the engine using English queries to retrieve Arabic documents. A hundred one-term English queries, using terms common in the information science field, were entered in Google, and the top 10 documents retrieved by each query were saved in a local database. The saved documents were analyzed to determine the success of Google in retrieving the correct documents (documents that fit the translated terms) and to explore causes of search failures. The proposed poster will present the results of the analyses conducted on the documents and identify areas of improvement. It will also recommend solutions to problems that hinder successful English-Arabic CLIR on the Web.