Thoughts on digitization & libraries while working on Hardin MD

Main menu

Jon Orwant on Google Book Search at TOC

Jon Orwant, from Google Book Search, made a presentation at the O’Reilly Tools Of Change (TOC) for Publishing Conference in New York last week, which I did not attend. Apparently Orwant presented some numeric data about the use of Google Books, but the data has yet to be spread to the world (See my comment on Peter Brantley’s blog about this). I’ve been searching in the week since TOC, to see what discussion there is of Orwant’s talk, and have found little. So I’m excerpting the three pieces that I have found. Only the first has any numeric data at all.

First, a piece by Jackie Fry, on the BookNet Canada publishers’ Blog. This is notable, and I’m putting it first, because it’s the only report I’ve found that has any numeric data at all from Orwant’s talk:

Conversion rates from Google Book Search results have been great for their partner publishers, mostly in the Textbook, Reference and STM channels, particularly in the shallow backlist (2003-2005 pubdates) with the highest Buy the Book clickthrus on 2004 titles. For some publishers, conversion to buy is as high as 89% for the titles they have made available.

30% of viewers looked at 10 or more pages when viewing the content of the book to make a buy decision.

The future is analytics! Google is thinking about what data they can pull out of their logs and provide anonymous aggregate data around consumer behaviour like what books were purchased that were like this one, search terms used most often for a category, most effective discounts, most effective referral sites etc.

More research [is needed] – Saw some good presentations with quantifiable research included – Brian O’Leary from Magellan, Joe Orwent (sic) from Google, and Neelan Choksi from Lexcycle were some of the few presenters who were able to quantify in any way what is going on in the marketplace. We need more …

Jon Orwant, from Google Book Search, stated at TOC that ‘the ultimate goal of Google Book Search is to convert images to “original intent” XML’. He explained the post-processing Google runs to continuously improve the quality of the scanned books, and to convert images to structured content. Retro-injecting structure accurately is no mean feat but when it’s done, Google will be able to transform the books into a variety of formats. The content becomes mutable and transportable, in a sense it isn’t yet, even though it is scanned, online and searchable. Orwant also presented three case studies – McGraw Hill, OUP, Springer – that demonstrated the benefits publishers can gain from having their books in GBS.

Highlighting the theme of discovery (to my mind), Tim O’Reilly interjected, at the end of these case studies, and made the point that O’Reilly used to own the top links to their own books in Google search results, but have now lost those links to GBS. Orwant, somewhat simplistically, responded that O’Reilly needed to improve their website to regain the top ranked link per title, as this spot was determined by Google’s search algorithms. This was not a convincing response, and dodged the issue, which I understood to be that the scale and in-house-ness of GBS could seriously inhibit the ability of the publisher to represent their own products online at the most common point of entry by the consumer, Google search results. There are many compelling reasons for publishers to own the top search result link, the most obvious being: offer unique additional content around the title, start a conversation with the reader, control the brand.

Thanks to a concept called blending, Google Book Search options remain in the top search results. An effort to direct traffic GBS’s way. …

There are 1.5 million free books, all public domain titles, available on Google. But if you want to access them, well, you’ll have to go to Google. Or you’ll have to have Google generate results at your site. Because the Google team are specialists in latency. They can do things with milliseconds that you couldn’t work out in your dreams.

As an experiment, Google recently unleashed Google Books Mobile, which means that you can nose search Google Book Search from your smartphone … Orwant was careful to point out that Google is not in the handset manufacturing or carrier business. But he anticipated, just as many of the seer-like speakers at Tools of Change did based on sketchy inside information, a “rapid evolution.”

Tim O’Reilly tried to badger Orwant too. You see, O’Reilly used to have good webpage placement for many of his titles. But those days are gone, replaced by Google Book Search results above the O’Reilly pages. And that hardly seems fair …

There’s some comfort in knowing that 99% of the books at GBS have been viewed at least once. Even the sleep-inducing textbooks. Which is really quite something. Which brings us to the future, which is based on the past …

That snippet view you see with some titles? Orwant‘s official position, pressed by Cory Doctorow, is that it’s fair use. But once the October 2008 settlement in Authors Guild v. Google is approved by the court, you’re going to see that snippet view jump to 20% of the book.

Allen Noren here. I manage oreilly.com. This isn’t necessarily true, at least not at the moment. Google regularly adjusts the placement of GBS results and how they’re displayed. For a while last year the GBS result was always in the top position, and they added a cover with every result. It was arresting. Even though the equivalent O’Reilly result was almost always in the second position, I imagine a lot of eyes never made it further down the page.

I remember when Google books first came out I thought it was a great idea, but wondered how it was affecting publishers. It seems a little hard to believe that they have conversion rates as high as 89%. I would prefer to read a real book rather than one on the screen, but I don’t think that is a big issue. Who has a right to represent the books is also a big issue. Simply offering it to the best seo optimized page doesn’t seem like a very fair strategy in this case.