Tag Archives: Linked data

In March, Mark Matienzo gave this presentation on re-purposing metadata. What really caught my attention was how Mark used the idea of linked data to explain how metadata can be used and re-used. This message is especially poignant given RDA’s focus on relationships. It has also been discussed in several blogs, quite recently by Karen Coyle, in addition to listservs such as NGC4Lib. Moreover, despite a focus on archives, this presentation is a great and simple introduction to linked data and practical applications.

Like this:

Richard Wallis recently posted on his blog, Nodalities, a presentation he did on linked data. I’ve heard Richard speak at Code4Lib and he along with Talis are doing some extraordinary stuff in terms of linking data and web 2.0 technologies. You have the option of viewing the slideshow below or you can also go to Nodalities to listen to Richard talk about linked data in action.

Like this:

The listserv, the Next Generation Catalog for Libraries has been extremely busy this last month. Three discussions really stand out: FRBR’s Group 1 entities and what type of identifiers are associated with them, in particular works, expressions and manifestations ; Tim Berners Lee and the Semantic Web ; FRBR’s user tasks and their continued relevance.

Unlike some listservs, these threads can be read in their entirety online. William Denton’s FRBR blog as well as some others have already advertised this to the community. I would like to re-advertise these discussions because of their importance in understanding FRBR and RDA among other things. In doing so, I would like to highlight some points from these threads. I will do this in a series of 3 blogs one on identifiers, a second on user tasks, and then end with the thread on Tim Berners Lee, which is still very active on NGC4LIB. I posted last week on FRBR and identifiers. This post will be about the thread on Tim Berners-Lee and the Semantic web.

Don’t let the title of the thread fool you. This discussion went everywhere! There are still some replies to the listserv over the past week that continue the discussion. Remember to look at the archives for October and November. Basically, the thread began with the posting of a recent talk by Sir Tim Berners-Lee at: http://fora.tv/2009/10/08/Next_Decade_Technologies_Changing_the_World-Tim-Berners-Lee. This 38 minute interview was the beginning to a very rich discussion about data, how to identify data, and how to get data out on the web to be used and re-used.

Here are some points that I found of particular interest:

How do we get data out on the web? Will RDA help get the data out on the web?

This is an excellent question that comes up several times in this thread. This is also a question that came up in the recent OCLC webinar on what they are doing about RDA. For the most part, library data is stored away in catalogs that are really not being minded or search by search engines. The wealth of information is there but it is not in a web friendly format. As Jim Weinheimer and others pointed out, it is essential to get that information out there. However, there was mention that the Library of Congress library data is out there and the Internet Archives’ data is out on the web as well. So why aren’t people looking at it? Isn’t RDA and its relation to the semantic web supposed to help not only get library data out there but also get people looking at it? These questions really didn’t get formalized in terms of answers. Yet, it was interesting to follow the trail of people’s thoughts. Yes we need to get library data out there. RDA theoretically will help us do this because of its relationship to the semantic web. However, will this incite users to come and look at this data?

RDA and user taks

The discussion about RDA and the web led to the question of the relevance of the user tasks that RDA brought over from FRBR. These user tasks are: find, identify, select, obtain. FRBR was published in 1998, more than a decade ago. Do these tasks represent what users do when searching for information? Remember that FRAD, FRBR’s sibling for authority work, has slightly different user tasks. What does this say about users tasks?

The idea of a domain model (RDFS, OWL ontology, RDF)

One of the reasons that RDA will be useful for the semantic web according to some on the thread is that it is based on a domain model which can be expressed as an RDFS/OWL ontology. From the thread, this domain model is important since it provides a framework with which to work from. Even if this framework contains flaws, it is still a helpful framework that can evolve as the web evolves since it is tied to the language used on the web. This is useful since it also means that RDA will evolve with the web.

Multiple meanings of FRBR and RDA: FRBR, RDA, RDAonline

This was a very interesting post by Karen Coyle. I think she highlighted a huge problem. With all the discussions about RDA, FRBR, and the product of RDA that is going to be pubished sometime in the future, a multitude of interpretations surrounding these concepts have arisen. Karen was right to point out that we will have 3 things: FRBR, RDA, and RDAonline. These are 3 different things that serve 3 different purposes. In addition, we have to remember that RDA has inherited many legacy issues from AACR2. This is one of the reasons why RDA is criticised by some as not going far enough. To make matters more confusing, there is also the Metadata Registry, which is related to RDA, RDAonline, and FRBR but is its own enterprise with its own mission.

OLAC and WEMI

It has been known the audio-visual community has had trouble with the notions of work, expression, manifestation, and item for quite some time know. Until this thread, I really hadn’t found a good explanation as to the reasons why as well as what OLAC planned to do about it. Kelly C. McGrath wrote about OLAC’s position on Wed. 21, 2009. She was responding to the importance of the WEMI (work, expression, manifestation, and item) model as a good starting point. Kelly writes: “We are trying to take a practical approach. At a theoretical level, the four levels as defined by FRBR make a lot of sense (although if you , for example, include expressions of expression, it would seem you could have even more levels).However, when we came to recording things on different records, it quickly became apparent that the split between Work and Expression (e.g., things like color, aspect ratio, and costume designer only at the Expression level) was not very workable for us. We therefore settled on a model that uses primarily a Work/Primary Expression (usually the original public release if applicable) record and a Manifestation record. Information like color and aspect ratio of a DVD in hand are meaningless unless you know the original, intended values. So a 1:33 (full screen) DVD Manifestation of a TV program that was originally broadcast in 1:33 and of a film that was in 2.66:1 don’t mean the same thing to the purist. The purist would be happy with the former and not with the latter modified version. So we want to record the original, intended value at the Work level so that it can be compared with particular Expressions. We thought it was most practical to have the information that we intended to re-use for all instances in a single record. It is also in line with the way film reference sources and online databases like IMDB display information.We also think, from a practical perspective, that most Expression information can be coded in machine-interpretable form in the Manifestation record and a display of Expressions could be generated automatically. Every time a cataloger gets a new Manifestation, this information has to be reevaluated again. Moving image expressions tend to be multi-faceted so looking for an Expression record for the exact combination in hand could be time-consuming and finding expression records for each individual aspect is no better than just encoding the characteristics in the manifestation record.

We don’t think a colorized version of a film is a new Work. Rather we would call it a new Expression and record it in the Manifestation record in such a way that it will be obvious to the user that the color of this version has been modified.

It is also not clear to me that the hierarchical approach of choosing a work, then an expression, then a manifestation is always the order that users need. For moving images, for example, users might want to limit to those works available on DVD or usable in English up front.

One way this might be displayed to users can be seen in Figure 8 (near the bottom) at http://kmcgrath.iweb.bsu.edu/MIWgrant.htm. The top facets are the WPE facets and the left facets come primarily from Manifestation records. So the original color or aspect ratio might be at the top and the ones for the available manifestations on the left. These comparison might be more useful in the WPE record view in Figure 9 (very bottom) where the original aspect ratio is given in the body of the WPE record and the available aspect ratios are given on the left. It might also be useful to label the non-original aspect ratio(s) as “modified.”

FRBR, does it work best with an already large database of bibliographic data? Does it require that catalogers search for information they might not know or have access to? — Linking data, sharing data, …

Identity management

Are libraries outdated? Why aren’t people going to libraries?

I could list so many more topics from this discussion. Even though this thread is long and can be found in the archives for both October and November 2009, the discussions are well worth the detour.

Like this:

I recently had a chance to return to the Virtual International Authority File, or VIAF. Actually I was looking for another name instead of Eddie Santiago. As it turned out, I didn’t find the name I was hoping to find. I did however see some changes to VIAF in terms of linked data. I searched again for Santiago, Eddie out of curiosity to see if the results where any different from last week. Unfortunately, Eddie Santiago still had 3 distinct VIAF records. However, I noticed that the first search result had grouped in one record different headings from different national agencies.

When I clicked on this first link, the result was very much what one might expect of a linked data experiment.

Unfortunately, the bottom of the diagram was cut off. Despite this, this diagram still illustrates the hope to link information from one source of information to another.

But, I was again led to the same question. How is this helpful for libraries and in particular catalogers?

I came across this post to AUTOCAT by Allen Mullen.

Commentary on this by Jennifer Eustis:
https://celeripedean.wordpress.com/2009/10/01/viaf-and-linked-data/
Except:
"How will this help libraries? Essentially, instead of an authority record
being tied to one language,headings can be accessed according to a unique
identifier. In that way, the information associated with that identifier
can be displayed according to language. More importantly, this allows
those searching for records to have a larger pool within to search.
Especially for digital libraries, the VIAF, given the recent addition of
the Getty List of Artists Names, could be extremely useful since it is
almost like a federated search."
"It is not the strongest of the species that survive, nor the most
intelligent, but the one most responsive
to change." Charles Darwin

Is VIAF the strongest of such authority file species? Is it the most intelligent? Is it the most responsive to change? When I first wrote my post on VIAF and linked data, I did not think of VIAF in terms of strength or intelligence. I think Allen brings up an important point by quoting Darwin. In my post, I mentioned that VIAF is a pilot project from OCLC, which is one of the only players in terms of bibliographic data. The spread of their influence seems to be increasing and this project, VIAF, is an example. However, does this make VIAF strong? Furthermore, I mentioned that the service provided by VIAF could be helpful to libraries and librarians. But, does this make VIAF an intelligent service?

Perhaps a better way to phrase these questions are: What are the strengths and/or weaknesses of VIAF? Is the way in which VIAF links information between national agencies done in an intelligible manner? What’s more, is the way in which VIAF links information between national agencies an intelligent thing to even undertake?

As to the last point from this quote from Charles Darwin, I have to say that the VIAF is already making changes as to how various forms of authorized headings are linked in terms of which national agencies use which authorized form of that heading.

But again, are these changes adding to the strength and/or overall intelligence of this project?

Like this:

Tom Hickey recently posted on a new development with the VIAF. The VIAF or the Virtual International Authority File is an international project to bring together in one place authorized headings from several libraries and institutions around the world. Interestingly enough, VIAF is hosted and implemented by OCLC. This is where the linked data part comes into play because Tom Hickey is the co-lead for what is essentially a OCLC project using OCLC software.

According to Tom Hickey, linked data means:

To us linked data means:

URIs for everything

HTTP 303 redirects for URIs representing the personae our metadata is about

HTTP content negotiation for different data formats

An RDF view of the data

A rich a set of internal and external links in our data

This is what the OCLC website on the VIAF explains as well. This means that there will be one giant authority record from which any institution can use independent of language. OCLC’s website explains the process as follows:

OCLC has proven software for matching and linking authority records for personal names.

That software will be used to match the authority records from The Deutsche Nationalbibliothek and the Bibliothèque nationale de France to the corresponding authority records from the Library of Congress.

Once the existing authority records are linked, shared OAI servers will be established to maintain the authority files and to provide user access to the files.

Users then will be able to see names displayed in the most appropriate language.

For example, German users will be able to see a name displayed in the form established by the dnb, while

French users will see the same name as established by the BnF, and

American users will view the name as established by LC.

Users in either country will be able to view name records as established by the other nation, thus making the authorities truly international and facilitating research across languages anywhere in the world.

How will this help libraries? Essentially, instead of an authority record being tied to one language, headings can be accessed according to a unique identifier. In that way, the information associated with that identifier can be displayed according to language. More importantly, this allows those searching for records to have a larger pool within to search. Especially for digital libraries, the VIAF, given the recent addition of the Getty List of Artists Names, could be extremely useful since it is almost like a federated search.

In my own research for an image collection, I used the VIAF to find the name of a Puerto Rican musician called Eddie Santiago. I was able to find 2 different forms of the authorized heading that referred to the same person. LCNAF and the National Library of Germany both used the heading, Santiago, Eddie, while the National Libraries of Spain and France used Santiago, Eddie, 1961-. This was useful in that I was able to see results from not only my own country but also 3 others. The problem is that for this one person, there were 2 authorized forms. Furthermore, there were 3 distinct VIAF authority records for this one name: VIAF ID: 13972441, 90646721, 71683685. It didn’t look like these authority records were in any way linked to one another. It was my own research that lead me to see that these 3 VIAF records were for the same individual. This is a sign that the VIAF has some way to go.

I like the idea. However, I am not completely happy to see that it is hosted and implemented by OCLC. Right now this is a pilot project. Most likely, it will pick up speed and the problem that I encountered will be taken care of. This begs the question of how OCLC will profit from VIAF. Will libraries have to pay in order to access this giant authority file? Because nothing, as yet, exists like the VIAF, will libraries have to pay premium prices for data that was created by them in the first place?

Share this:

Like this:

Over at the Nodalities blog, Leigh Dodds posted a highlight of his presentation on linked data.

I opened by speaking about the fundamental idea behind Linked Data: that data be put online, in a very fine-grained way. This takes us beyond having stable links for datasets or just articles, and yields web identifiers for the Who, Why, What, Where and When of the content: every person; place; category; and event can each be identified, annotated and ultimately linked together into a navigable whole. RDF, as the core technology for Linked Data, is very simple to get to grips with, with the notion of resources and their connections being something that anyone can intuitively grasp in a few minutes.

He also includes a short inroad into the notion of verifying sources and checking the quality of data.

The ability to identify and ignore questionable sources, or identify stories that are drawn from inaccurate data or analyses, is something that has been previously been very hard to do.

It will be interesting to follow the progress of linked data to see if it can live up to this ability of data quality control.

This is a good detour. And, Leigh has included links to his PowerPoint presentation and some other resources for linked data.