The OCLC Board of Trustees has convened a Record Use Policy Council, which will draw upon the fundamental values of the OCLC cooperative and engage with the global library community to develop the next generation of the WorldCat Record Use Policy. The intent is to recommend to the OCLC Board of Trustees a new policy that is aligned with the present and future information landscape. The new policy will replace the Guidelines for Use and Transfer of OCLC Derived Records that was developed in 1987.

The Council will look into other issues as well such as “development of a policy to enable expanding the role and value of WorldCat in the broad information ecosystem.” The possible timeline for all of this is mid 2010.

Josh Hadro discusses OCLC’s announcement in his recent article in the Library Journal. Much of what Josh highlights is straight forward in terms of summarizing the announcement. However, Josh inserts this small bit of news:

As has recently been noted by LJ and others, part of OCLC’s attempt to expose library data more involves an agreement with the team responsible for the Google Book Search project.

This really picks up the current conversation about the recent settlement of Google and Nunberg’s disparaging view of Google’s metadata. What will such an agreement look like? If data is more open, will Google still pick and choose from records metadata that can lead to inconsistencies? How can such an agreement be worked out in terms of what happens to data created by librarians but used by other agencies? I think this last question brings up some concerns of the Record Use Policy in that much of OCLC’s records derive from contributions from hard-working librarians. What will happen to their work when this Policy goes into affect and especially if there is an additional agreement with Google?

Like this:

Over the past week or so, there have been several discussions about Google as a last library and its weak metadata. This came about after a presentation from Geoff Nunberg as well as articles recounting Nunberg’s assessment of Google’s metadata errors.

Geoff Nunberg, one of America’s leading linguistics researchers, laid this rather ominous tag on Google’s controversial book-scanning project amidst an amusingly-heated debate this afternoon on the campus of the University of California, Berkeley.
“This is likely to be The Last Library,” Nunberg said during a University conference dedicated to Google Book Search and the company’s accompanying $125m settlement with US authors and publishers. “Nobody is very likely to scan these books again. The cost of scanning isn’t going to come down. There’s no Moore’s Law for scanning.

Cade continues with two important issues. First, who will have control over these scanned books? Second, right now that Google is the major (if not only player of its size in town), how can it be the only library which researchers of the future must rely on if the metadata is unreliable and inconsistent? Unreliable metadata or not, the problem is that Google has had a head start, has more money than most libraries around, and has control not just of scanned books but also much of the Internet as well.

Cade ends with this very ambiguous conclusion:

Google says that if it hadn’t scanned all those books, no one else would have. And now there’s less incentive to scan all those books. But Google insists it’s not The Last Library.

What is interesting is that ripples of Geoff Nunberg’s assessment have become waves of discussion. In his article, Google, the Last Library and Millions of Metadata Mistakes, Norman Oder focuses on the inconsistent metadata that Geoff Nunberg analyzed in the Google Book scanning project and that the Registry article brought up. This article goes beyond Cade’s reporting in that it provides some answers from Google in regards to Nunberg’s accusations.

First, we know we have problems. Oh lordy we have problems. Geoff refers to us having hundreds of thousands of errors. I wish it were so. We have millions. We have collected over a trillion individual metadata fields; when we use our computing grid to shake them around and decide which books exist in the world, we make billions of decisions and commit millions of mistakes. Some of them are eminently avoidable; others persist because we are at the mercy of the data available to us. The quality of our metadata now is a lot better than it was six months ago, and it’ll be better still six months from now. We will never stop improving it.”We have a cacophony of metadata sources—over a hundred—and they often conflict,” he added, contrasting that with library cataloging practices. “Without good metadata, effective search is impossible, and Google really wants to get search right.

The unreliable metadata is not just Google’s fault but from outside sources that provide metadata to Google. This point is taken up by Christine Schwartz over at her blog, Cataloging Futures, who reminds us that this is not just a source of worry for Google but also Libraries as well. In her post, Christine mentions that digital projects constantly face metadata errors. She asks some great questions such as:

At what point do you create the metadata and by whom?

Is metadata automatically extracted?

Is there human oversight or any quality control?

I think we can add to Christine’s list of questions the following:

What standards are being implemented?

How much are local policies changing or tweaking national or international standards?

How are unknowns being dealt with such as unknown creators or unknown dates?

Is the metadata and quality control being consistantly done even when librarians come and go from the digital projects?

One of the problems that can be seen is that perhaps the metadata is consistant in one digital collection in one library. But in another or at another institution, they do things differently – even if the same metadata set is used (such as Dublin Core, which comes in many local flavors).

However, the issue seems more complex as Karen Coyle points out in her post, GBS and Bad Metadata. This issue is that the Google book scanning project also gets much of its metadata from libraries for its books through OCLC. As Karen ventures, Google most likely has a contract with OCLC that restricts what Google can take. Karen writes:

This leaves us with a bit of a mystery, although I think I know the answer. The mystery is: why would Google only use limited metadata from the participating libraries? And why won’t they answer the question that I asked at the Conference: “Do you have a contract with OCLC? And does it restrict what data you can use?” Because if the answer is “yes and yes” then we only have ourselves (as in “libraries”) to blame. And Nunberg and his colleagues should be furious at us.

In a recent post in the NGC4LIB list, we got a very welcome answer from Chip Nilges of OCLC about Google’s use of WorldCat records:

To answer Karen’s most recent post, Google can use any WC metadata field. And it’s important to note as well that our agreement with Google is not exclusive. We’re happy to work with others in the same way. The goal, as I said in my original post, is to support the efforts of our members to bring their collections online, make them discoverable, and drive traffic to library services.

Regards,

Chip

Karen goes on in her post to discuss the types of metadata that Google should include in the book scanning project. Among the types are Scholarship, Collection Development, Metasearch, Links to other related resources, and Computation. These types of metadata cover a broad spectrum of what researchers need in collecting and analyzing research materials. This metadata has to be as clear as possible in terms of making more reliable connections in especially for linked data. Despite this comprehensive approach that Karen suggests, it seems to require more rather than less metadata. Perhaps instead of thinking that we should take everything from a record, perhaps it is the question of quality over quantity. Instead of just taking as much quantitative metadata, getting as much qualitative metadata that helps uniquely identify an item as well as promote knowledge discovery is and will become increasingly essential.

A big question that comes out of reading these conversations is how can libraries help Google? Would Google even want help from libraries in order to improve the metadata with their book scanning project? Furthermore, can libraries improve their own metadata qualitatively and make it more interoperable across libraries and digital collections? Perhaps that is the first step for libraries before it can even approach Google….?

Share this:

Like this:

I have been using the updated browser Chome from Google for about a week now. I really enjoy how quick it is. With that in mind, there are a few bumps to browser heaven.

I have had trouble displaying PDF’s.

Chrome sometimes closes for no apparent reason.

Chrome will not open LC’s Cataloger’s Desktop.

I tend to use Chrome in particular for Classification Web. I find that it is much faster than either Explorer or Firefox. I also use Chrome for other sites such as the Internet Movie Database, LC’s Genre website, or other “text-based” websites where I don’t need to display or download any documents.

For the moment, I wouldn’t use Chrome for everything. But I do appreciate it for Class Web. Knowing Google, there will hopefully be another version out soon that addresses these issues. Until then, Chrome is not all it good be.