libraries, books, and other sundry nerdiness

Library Linked Data

Over the last year I’ve read and heard a fair amount about library linked data, and I have not yet been able to form a whole and coherent picture of what this means for libraries. I see examples, and I understand the idea and the benefits, but I don’t see how the examples and the idea and existing library records and the needs that we have for library data all come together into something cohesive. And to be honest, Saturday’s 8 am session on linked data didn’t 100% pull it together for me, either. Ross Singer and Eric Hellman gave some good examples, and perhaps I would have grasped this all a bit better if I’d been able to see during Ross Singer’s portion of the presentation, but I was sitting on the floor in the back and the slides were invisible to me. Overall, though, I still felt like the presenters were showing me how linked data worked without really showing me the why, or putting it into real, effective practice. The following is mostly transcribed from my notes; I apologize if it feels scattered and disconnected, but I kind of think most of our understanding of library linked data is scattered and disconnected right now.

I think the way I’m currently understanding linked data is that instead of having a database full of bibliographic records that consist of textual data in tagged fields, records would consist of links to “authority” records in other databases on the web in places like the Virtual International Authority File and LC Authorities. The links themselves would use various schemas in order to be descriptive. The links describe relationships, so you’d have one link that says, “This book was written by this person,” and another link that says, “This book was published by this company,” and other links that say, “This book is about this subject.” The linking syntax is a very simple subject-predicate-object syntax (Author Created Book). URIs would have to be permanent, unambiguous, and preferably using HTTP as the protocol. And it will all create a nebulous network of information and magic or something.

Some examples of descriptive schemas include BIBO (the Bibliographic Ontology), SKOS (Simple Knowledge Organization System), Good Relations (largely for e-commerce), and FOAF (Friend of a Friend). Schemas are written in the Web Ontology Language (OWL). There isn’t a notion of validation with schemas; the only constraints are logical contradictions. Vocabularies can be mixed up and used in combination to describe things as fully as possible.

At one point Eric Hellman says he was asking library school students, who were learning to work with MARC, why we use MARC, and no one had a clue. The fact is we use MARC because it was cutting edge for its time, it did what we needed it to do 50 years ago, and we haven’t been able to move away from it because we are a lumbering, lumbering dinosaur. And we have no money. And we are very invested in this format, because our bibliographic data is vast and extensive and monstrous, and it will be a pain in the arse to convert it to something else. And people are still creating individual records in our own individual systems, so moving to something new involves way too many people changing way too many bits and bytes. And I could go one about this problem, but I’ll stop now. (I in no way think these should be excuses for why we don’t make a change, by the way.)

MARC is almost entirely textual data that we translate so machines can (kind of) read it. But only old machines; new machines don’t speak this language. We no longer really need the textual data; the machines should be able to translate bits and bytes into textual displays, but we don’t need to save the textual data in our records, just links to data and identifiers. MARC records describe manifestations, tangible items, but linked data will make it easier for us to describe things using a FRBR-type hierarchy.

Records exist because we needed surrogates for our books, as the print books themselves were not possible to search. We don’t need surrogates for digital things (although there are problems with full-text searching that went unacknowledged in this session). Library content is no longer in print and is no longer in the library (although this isn’t exactly true, and I don’t think it will be completely true for awhile). The transition to eBooks will be fast, and it will happen soon, so we need to figure out what this means for our data now (ok, I do agree with this one). Hellman claims that metadata isn’t as essential as it used to be, but I disagree; I just think it’s essential for different things. It’s essential for access, preservation, and other administrative uses almost more than for discovery at this point. In other words, it’s less important for searching for information, and more important for using information.

The good news for us (I think) is that libraries aren’t the only organizations using metadata anymore. More and more, our materials will come with metadata already attached. But will it be the right metadata? The information by publishers and content creators will be different from the information needed by preservation bodies and organizations like ours responsible for providing access. But we can use some of what is provided, and merely add our own bits and bytes.

Creating linked data-based records will open up our bibliographic (and circulation?) data to search engines. If our content doesn’t appear in Google, people won’t find it. Search Engine Optimization should be foundational to how we re-think our bibliographic records. We can prepare our records for search engines by using microdata. This goes back to the need to create structured data, using schemas that search engines can understand.

Most interesting thing in my notes: The Facebook “like” button is the #1 use of the semantic web right now. It uses something called the Open Graph Protocol. We don’t use things like this in libraries (even though we should, I think): I saw a ton of people around me mouthing the word “privacy” to each other, but again, I think we need to let our patrons negotiate privacy for themselves. They can decide whether they want to click that “like” button on our pages; providing the means to do it doesn’t force them to do it.

Again, I apologize that my notes are a little all over the place. I know this is an area I’ll be giving more thought to in the coming weeks, and I have some notes up for a post on re-thinking the ILS and re-thinking bibliographic data that I’m hoping will bridge several of the ALA sessions I attended, including this one. I’d love to hear comments from anyone else who can fill me in on details I’m missing, or just ask more questions to further explore the ideas with me.