Google: huge database for public research

Submitted by Bernardo Parrella on Sat, 18/12/2010 - 17:33

A new landscape of possibilities for research and education in the humanities. [17dec10]

With little fanfare, Google has made a mammoth database culled from nearly 5.2 million digitized books available to the public for free downloads and online searches, opening a new landscape of possibilities for research and education in the humanities.

The intended audience is scholarly, but a simple online tool allows anyone with a computer to plug in a string of up to five words and see a graph that charts the phrase’s use over time — a diversion that can quickly become as addictive as the habit-forming game Angry Birds.

“The goal is to give an 8-year-old the ability to browse cultural trends throughout history, as recorded in books,” said Erez Lieberman Aiden, a junior fellow at the Society of Fellows at Harvard. Mr. Lieberman Aiden and Jean-Baptiste Michel, a postdoctoral fellow at Harvard, assembled the data set with Google and spearheaded a research project to demonstrate how vast digital databases can transform our understanding of language, culture and the flow of ideas.

Their study, published in the journal Science, offers a tantalizing taste of the rich buffet of research opportunities now open to literature, history and other liberal arts professors who may have previously avoided quantitative analysis. 'Science' is taking the unusual step of making the paper available online to nonsubscribers.

So far, Google has scanned more than 11 percent of the entire corpus of published books, about two trillion words. The data analyzed in the paper contains about 4 percent of the corpus.