Open Bibliographic Data: The State of Play

Given the public role of libraries and the fact that bibliographic metadata (i.e. the material in library catalogues) doesn’t seem that exciting from a commercial point of view you might think that, of all the types of data out there, it would be bibliographic data that would be the most open. You might even think, given the public-spiritedness of librarians, that this is the kind of area where not only could it be openly available but it would be openly available (in nice little bzip or gzipped dumps …).

In fact the situation is quite the opposite. Most libraries appear to implicitly or explicitly exert rights over their data with some libraries licensing access to their catalogue data for substantial sums of money. The following lists some of the examples (both closed and open) that we know of:

Library of congress: public domain in the US (or at least free) but copyrighted outside the US. See [1] and comments in in fred2.0 readme which state:

These data are works of the United States Government and as such are not subject to copyright within the United States. (17 U.S.C Â§105).

The Library of Congress has copyrighted these data for use outside the United States. Contact the LC for permission prior to use or distribution of this data outside the United States. [http://www.loc.gov/cds/mds.html, which quotes a price of e.g. $21,905 for the ‘Complete Service’.]

fred2.0 (fred2.0 CKAN package): an excellent example of the effort to make material available but unfortunately has same restrictions as Library of Congress (from which the material is sourced).

British Library: closed (and apparently gets sold for substantial sums).

LibraryThing: closed. Does not seem to make data available and source would likely make this problematic (from the about page):

LibraryThing uses Amazon and libraries that provide open access to their collections with the Z39.50 protocol. The protocol is used by a variety of desktop programs, notably bibliographic software like EndNote. LibraryThing appears to be the first mainstream web use.

As we continue to search for open sources of bibliographic data we’d love to hear from anyone who knows of examples not already on this list.

Christian: thanks for the pointer. In fact, you’ll be pleased to hear, we are already very much aware of RePEc (there’s even a RePEc CKAN page) but here we were trying to focus more on traditional library bibliographies rather than more general repositories — though one could argue that this distinction is rapidly being erased. Let me also add that I think RePEc is a wonderful resource and one of the more open out there though I would note that, at least in terms of the definition of open we use, it is isn’t fully open as it partially restricts commercial reuse (see RePEc CKAN page for more details).

What about the legal deposit libraries such as the Bodleian or Cambridge University Library? I would imagine that they would be closed as well though the Bod is allowing Google to scan their books, I believe.

Iain: good suggestion. I did get in contact with Cambridge University Library but it never came to much: apparently, they had outsourced their database management and it was going to prove difficult to get a dump of the db, and, furthermore the digital catalogue only went back to the 1970s (at that point I was specifically interested in trying to get statistics on information production in the 19th and early 20th century).