Note There are several alternatives available for FRBR-vocabularies. We are using the version by Ian Davis et. al because of a naming problems in the current IFLA version. Predicate names in that version have numbers as local name parts, which makes it impossible to serialize the data as RDF/XML.

Mapping of fields

We have mapped to fields from the record-centric RDF/ISO2709-format to a resource-centric BIBO-description as follows. Note that the original field names used below may contain wildcards for single characters (. as used in regular expressions).

Resource-URI

The URI of the resource that is to be described is derived from identifier of the record, to be found in <rdfmab:field/001__a>.

<rdfmab:field/700b_a> contain DDC-Notations. In order to link to the Linked Data Version of the classification, these numbers are truncated to the first three levels. If the full classification where available, we would be very happy to link to deeper levels.

bibo:isbn

The ISBN of the resource, found in <rdfmab:field/540._a>. The ISBN is deliberately provided as a string, not a URI, since it is the string that is the identifier, not some resource identified by <uri:ISBN:ISBN>. This conforms to the range defined in the BIBO.

bibo:issn

The ISSN of the resource, found in <rdfmab:field/542._a>. The ISSN is deliberately provided as a string, not a URI, since it is the string that is the identifier, not some resource identified by <uri:ISBN:ISBN>. This conforms to the range defined in the BIBO.

dc:extent

The extent of the resource, usually the number of pages, as found in <rdfmab:field/433__a>.

dcterms:issued

The year the resource was issued, as found in <rdfmab:field/425a_a>.

rdf:type

The type of a resource is derived from several fields, thus possibly resulting in multiple types for the same resource. The current mapping is most likely over-simplified and will be subject of a further analysis for future releases:

if the value of <rdfmab:field/050> contains an a at the first position, the resource is typed as dc:BibliographicResource.

if the value of <rdfmab:field/051> contains an m at the first position, the resource is typed as bibo:Book.

all resources are generally typed as frbr:Manifestation.

bibo:volume

The volume number of the resource, found in <rdfmab:field/090_a>, which holds the sortable form. If this is not available, the descriptive form in field <rdfmab:field/089_a> is used.

dc:isPartOf

Fortunately, the original data already includes many links from subordinate to superordinate records which can be used to link the corresponding resources:

<rdfmab:field/010__a> contains the record-id of a direct superordinate

<rdfmab:field/453__a> contains the record-id of the first series title

<rdfmab:field/599__a> contains the record-id of the record describing the journal that this resource is published in.

bibo:authorlist

The <rdfmab:field/1..._9> fields contain authority numbers of the authors of the resource. To preserve the order, an rdf-list is used instead of simply linking all authors directly via dc:creator. The downside of this is that currently the authorlists are blank nodes and thus not handled ideally by generic Linked-Data-Displays such as pubby. Note that there are basically two types of authority numbers in the data: those maintained by the DNB (which are available as Linked Data) and local hbz-numbers, which are not available as Linked Data. In the first case, the resulting link leads to the Linked Data Service of the DNB, in the latter case the link unfortunately leads nowhere.

dc:publisher

The fields <rdfmab:field/412_a>and<rdfmab:field/410_a> contain the name and place of the publisher. To conform to the range of the dc:publisher predicate as defined in the DCMI Metadata Terms, we have introduced blank nodes for the publishers, typed as foaf:Organisation. The place of the publisher is attached as another blank node via geo:location. That blank node is typed geo:SpatialThing and has the name of the place attached by geonames:name, since we lack a mapping of the place names to geonames-identifiers. We are aware that this seems overly complicated, but we are trying to identify and properly model the entities that are referenced in the original data, even if that results in blank nodes in the first run. As soon as an authority file for publishers is available, we will try to link there. We might even have a look at the resulting blank nodes and see if the information is clean enough to form the basis of such a file.

frbr:exemplar

In the current state of the raw data, holding information is only implicitly available. Since the records are segmented into packages by instutition, we know that an institution is the frbr:owner of at least one frbr:Item of the described frbr:Manifestation. Since we currently do not have signature-information, those items are once again modelled as blank nodes.

There is a complete documentation of the fields found in the RDF/ISO2709-Version of the data. Unfortunately, the RDF/ISO2709-fields are not completely in line with this official documentation. This is due to the fact that our data passes through an interface that is based on MARC21 before it is published. Some fields are renamed in this process. We are working on either documenting the differences or using the proper fields.

Conversion process

Although we have released the raw data in an ntriples-format, using native rdf-tools such as rdflib for python has proved to be way to slow to handle massive amounts of data. Regular expressions in Perl are much faster, and thus used here. Due to the use of blank nodes as explained above, the script outputs RDF in turtle notation so that blank node identifiers don’t have to be generated.

Simple perl-regex based conversion script

Preliminary steps

We have released those parts of the union catalog that participating institutions have holdings in. For each institution the corresponding subset of records was extracted from the union catalog and packaged independently from any other records. This results in duplicate records for those resources held by several institutions. Thus, the first step was to generate a list of unique files. This file is then split up in order to process the data in parallel.

Preparation: Create list of unique files

Invoking the script

Generating holdings information

To generate the holding information, we simply generate the corresponding triples based on the file names in the data packaged by institution:

I note that you've hung the place of publication off the publisher, and wonder i...

I note that you've hung the place of publication off the publisher, and wonder if this is an issue. Clearly the 'place of publication' is linked to the publisher in some way - they'll have to have some kind of operating address I guess - but it also feels like this is a direct property of the published item as well, and having a direct link may well be beneficial. In the latest modelling from the British Library, they use the proposed isbd:hasPlaceOfPublicationProductionDistribution property. I'm not particularly keen on this - partly because it doesn't exist yet, but mainly because I'm not sure it is sensible to limit this to a 'bibliographic' type property (many things can have a place of production). Any thoughts on this?

Hi Owen, sorry for the "late" answer... we think it makes sense to make this a d...

Hi Owen, sorry for the "late" answer... we think it makes sense to make this a direct property of the manifestation itself. Mainly because the place of the publisher may change but the place of publication can not change. Now, imagine we had URIs for publishers (which would be very nice indeed!)), the information of the place of publication can thus not be derived from the place of publisher.
Do you know a better property for this? http://iflastandards.info/ns/isbd/elements doesnot work . Also, do we really need a rdfs:label and thus a bnode?