Please log in

or

Register now for free

or

Choose your profile *

Email *

A valid e-mail address. All e-mails from the system will be sent to this address. The e-mail address is not made public and will only be used if you wish to receive a new password or wish to receive certain news or notifications by e-mail.

Password

Username *

Sign up to our newsletters

Higher education updates from the THE editorial team

World University Rankings news

Student newsletters

Send me special offers and marketing info from THE and selected partners

It should be the world's greatest scholarly resource, but some claim that Google Book Search's many huge - and often hilarious - errors raise major questions about its value to serious researchers.

Why does a link to a book on cosmology by a Napoleonic mathematician lead to a novel by Barbara Taylor Bradford? Could Sigmund Freud really be one of the authors of The Mosaic Navigator: The essential guide to the Internet Interface? And how did Barack Obama publish 29 books before he was born?

The journal Speculum is about the Middle Ages rather than gynaecological instruments, so why is it listed under "Health & Fitness"? And why on earth is a French translation of Hamlet classified under "Antiques & Collectibles"?

Even stranger, there seems to be something special about the year 1899, with Google claiming that a novel by Stephen King, a biography of Bob Dylan, a Portuguese version of the Beatles' film Yellow Submarine - and dozens of almost equally implausible titles - were all published then.

Such grotesque mistakes were pointed out by the linguist Geoffrey Nunberg, adjunct full professor at the University of California at Berkeley's School of Information, at its recent conference, "The Google Book Settlement and the Future of Information Access".

Mark Liberman, trustee professor of phonetics at the University of Pennsylvania, made a similar case. A self-proclaimed "enthusiast" for Google Books, he knew it would revolutionise his own discipline - the history of the English language - by hugely increasing the amount of textual material easily available for analysis, "with a potential effect comparable to the invention of the telescope or the microscope".

It remained crucial for scholars, however, that "basic bibliographic information - who wrote what, when - is almost always correct", he said. He added that he was sceptical about how soon the errors would be sorted out. Since such information "may not matter much to ordinary search customers, there is little incentive for Google to fix it", he said.

Professor Nunberg was even more outspoken in a blog posted on 29 August. With Google likely to become "the universal library for a long time to come", scholars need good metadata. Unfortunately, Google's information is "a train wreck: a mish-mash wrapped in a muddle wrapped in a mess".

The posting led to a long reply by Jon Orwant, who has the unenviable task of "managing the Google Books metadata team".

He cheerfully admits to some additional errors, such as an edition of Charles Dickens' A Christmas Carol dated to 1135 - three centuries before Johannes Gutenberg introduced the printing press to Europe.

He is also frank about the scale of the glitches still to be ironed out: "Geoff refers to us having hundreds of thousands of errors. I wish it were so. We have millions ... When you're dealing with a trillion metadata fields, one-in-a-million errors happen a million times over."

The glut of books "published" in 1899 is explained by a Brazilian metadata provider, which strangely uses that year as a default setting when it doesn't know the true date.