Day 1 · 2013-11-25 Preconference

Tutorials and Workshops

Introduction to Linked Open Data

This introductory workshop aims to introduce the fundamentals of Linked Data technologies on the one hand, and the basic legal issues of Open Data on the other. The RDF data model will be discussed, along with the concepts of dereferencable URIs and common vocabularies. The participants will continuously create and refine RDF documents to strengthen their knowledge of the topic. Linked Data tenets such as publishing RDF descriptions in a web environment and utilizing Content-Negotiation will be demonstrated and applied by the participants. Aggregating data from several sources and querying this data will showcase the advantages of publishing Linked Data, and RDF Schema will be introduced as an effective way of data integration. On a side track, Open Data principles will be introduced, discussed and applied to the content that is being created during the workshop.

Metadata Provenance Tutorial

When metadata is distributed, combined, and enriched as Linked Data, the tracking of its provenance becomes a hard issue. Using data encumbered with licenses that require attribution of authorship may eventually become impracticable as more and more data sets are aggregated - one of the main motivations for the call to open data under permissive licenses like CC0. Nonetheless, there are important scenarios where keeping track of provenance information becomes a necessity. A typical example is the enrichment of existing data with automatically obtained data, for instance as a result of automatic indexing. Ideally, the origins, conditions, rules and other means of production of every statement are known and can be used to put it into the right context.
Part 1 - Metadata Provenance in RDF: In RDF, the mere representation of provenance - i.e., statements about statements - is challenging. We explore the possibilities, from the unloved reification and other proposed alternative Linked Data practices through to named graphs and recent developments regarding the upcoming next version of RDF.
Part 2 - Interoperable Metadata Provenance: As with metadata itself, common vocabularies and data models are needed to express basic provenance information in an interoperable fashion. We investigate the PROV model that is currently developed by the W3C Provenance Working Group and compare it to Dublin Core as a representative of a flat, descriptive metadata schema.
We actively encourage participants to present their own use cases and open challenges at this workshop. Please contact the organizers for details.
Prior experience: The workshop is intended for participants who have mastered the basics of linked data and want to delve into expressing provenance. Beside a basic understanding of RDF, the linked data principles and the use of ontologies (like Dublin Core or Bibo) to express bibliographic metadata no specialised knowledge is required.

Linked Data Publication with Drupal

Publishing Linked Open Data in a user-appealing way is still a challenge: Generic solutions to convert arbitrary RDF structures to HTML out-of-the-box are available, but leave users perplexed. Custom-built web applications to enrich web pages with semantic tags "under the hood" require high efforts in programming. Given this dilemma, content management systems (CMS) could be a natural enhancement point for data on the web. In the case of Drupal, one of the most popular CMS nowadays, Semantic Web enrichment is provided as part of the CMS core. In a simple declarative approach, classes and properties from arbitrary vocabularies can be added to Drupal content types and fields, and are turned into Linked Data on the web pages automagically. The embedded RDFa marked-up data can be easily extracted by other applications. This makes the pages part of the emerging Web of Data, and in the same course helps discoverability with the major search engines.
In the workshop, you will learn how to make use of the built-in Drupal 7 features to produce RDFa enriched pages. You will build new content types, add custom fields and enhance them with RDF markup from mixed vocabularies. The gory details of providing LOD-compatible "cool" URIs will not be skipped, and current limitations of RDF support in Drupal will be explained. Exposing the data in a REST-ful application programming interface or as a SPARQL endpoint are additional options provided by Drupal modules. The workshop will also introduce modules such as Web Taxonomy, which allows linking to thesauri or authority files on the web via simple JSON-based autocomplete lookup, or SPARQL Views as a mighty tool for reaching out to other Linked Data sources. Finally, we will touch the upcoming Drupal 8 version.
Prior experience: No programming experience is required, however, practical knowledge in building web sites is very welcome.
Requirements: A laptop with a VirtualBox installation is required. Organisers will prepare a virtualbox image (Linux guest system) beforehand to be worked with during the workshop. Alternatively, participants may work with their own Drupal installation (preferably under Linux with drush installed).

CouchDB: A Database for the web

CouchDB is a document database with a http interface, versioned documents, map/reduce and replication. With this web native approach, its ability to define additional output formats in JavaScript, and built-in content-negotiation ability, it is a handy technology to use in a linked data context.
In this hands-on workshop participants will learn how to get documents into and out of CouchDB, how to analyse and list them in structured ways using map/reduce, and how to use these features in web applications. Special focus will be put on CouchDB’s abilities at supporting http content-negotiation for serving the appropriate output format. Participants will use their knowledge to create their own database and configure it to implement a linked data service with HTML and RDF output for their documents right in CouchDB.
Time permitting, we will go on and explore data updating/synchronisation using CouchDB’s replication feature or creating list output for their data (e.g. Beacon files).
Prior experience: Participants will need to work with http headers and request methods, JavaScript and JSON. These will be introduced briefly in the workshop but basic knowledge of these topics is recommended.
Requirements: Participants are expected bring their own computer.

Creating any data oriented application the main task is to import data from various sources, map the fields to a common data model and put it all into a database or search engine. In data-warehousing, these processes are called ETL — Extract, Transform, Load. Catmandu provides a suite of Perl modules to ease the import, storage, retrieval, export and transformation of metadata records.
After a short introduction in the rationales of Catmandu and presentation of sample applications at the Universities of Lund, Ghent and Bielefeld, participants will be guided to transform MARC records to Linked Data.
The steps include transforming MARC into a JSON model of choice, storing/indexing the model in ElasticSearch, and exporting/mapping the model as Linked Data.
Prior experience: We will be using a simplified ETL language. Any programming experience is welcome but not required. A brief tutorial on Catmandu programming can be found here.
Requirements: Laptop with VirtualBox installed. Organisers will prepare a virtualbox image (Linux guest system) beforehand to be worked with during the workshop.

Analysis of Library Metadata with Metafacture

Metafacture is a versatile Java-based open source toolkit for all metadata related tasks. It was developed in the Culturegraph project. Since then it has become an important part of the software infrastructure at the German National Library. Core applications of Metafacture are metadata transformation, search indexing and statistical analyses of metadata. Despite originating from the library domain, Metafacture is format-agnostic and has successfully been employed in metadata related tasks in other domains.
In this workshop participants will learn how to use Metafacture in order to analyse large datasets. After a short introduction to Metafacture three types of analyses, which are often encountered in day-to-day work at a library, will be presented: counting distinct data values, quantifying relationships between metadata records and joining metadata records. Participants will have the opportunity to perform these analyses themselves and to discuss their approaches.
Prior Experience: No programming experience is required. Participants should have a basic understanding of XML. A library background is not necessary as the analyses presented in the workshop are applicable to other areas as well.
Requirements: A laptop with a VirtualBox installation is required (or any other virtualisation environment that can open OVA-files).

PhD workshop

The PhD workshop at SWIB13 provides an excellent opportunity both for the beginner as well as the senior PhD student to present her or his ideas and receive feedback by experienced researchers and other PhD students working in research areas related to Linked Data based infrastructures and applications in libraries.
The Linked Open Data approach aims at a framework for the generation, publishing and sharing of information by means of semantic technologies. It plays a vital role in the realization of the Semantic Web at a global scale by publishing and interlinking diverse data sources on the Web. The access to a huge amount of Linked Data presents exciting opportunities for the next generation of Web-based applications, especially with regard to data hosted and provided by libraries. Facing use cases as depicted by the Library Linked Data Incubator Group, however there is still a need of Linked Data applications and best practice examples.
For this PhD workshop, we would like to discuss the initial ideas about issues in Linked Open Data that have proven to be both promising and challenging in the context of library use cases and applications.

Soylent SemWeb Is People! Bringing People to Linked Data

Tim Berners-Lee originally framed the Semantic Web as a depersonalized web of data and machinery humming away hidden in server closets. How much this lifeless, impersonal image contributed to RDF's initial adoption difficulties is open to question, but one clear lesson emerged: like the horrifying foodstuff from the cult 70s film "Soylent Green", the Semantic Web is ultimately made of people. People model data, sometimes without even knowing they're doing it. People crosswalk data. People design and configure and use data-entry, data-storage, data-discovery, and data-analysis systems. Without all these people, there can be no Semantic Web. So how do we best invite people -- including skeptical people, reluctant people, less-technical people, people committed to different data structures -- to learn about, contribute to, and use Linked Data?

Coffee Break

Mappings and Mashups

Automatic Creation of Mappings between Classification Systems for Bibliographic Data

Classification systems are an important means to provide topic-based access to large collections. Most library collections, however, are often only partially classified and use local or regional classification systems. Traditionally, manually created mappings between classification systems are used to improve this situation.
I propose a different approach to automatically create such mappings:
To achieve a large base for the mapping algorithm, bibliographic data from diverse sources that contain items classified by the classification systems is aggregated in a single database.
Next, a clustering algorithm is used in order to group individual issues and editions of the same work. The basic idea is that for classification purposes, there is no significant difference across editions. Indexing information can thus be consolidated within the clusters, resulting in a higher proportion of dual-indexed entries.
The novel step is that instead of individual catalogue entries, the "work-level" clusters are used for an instance-based matching: Statistical analysis creates a co-occurrence table of pairs of classes and high co-occurance of a given pair indicating a match between the two classification systems. This information is aggregated into a complete mapping
The approach is implemented on an open-source infrastructure which was mainly developed by the German National Library: CultureGraph.org. In ongoing projects, mappings between several classification systems are being produced.
The talk will discuss the approach, the implementation issues and the preliminary results as well as the challenges of publishing the created mappings as linked data.

Cross-Lingual Semantic Mapping of Authority Files

One essential application of the Semantic Web is Named Entity Mapping (NEM). Named Entities within texts are recognized and annotated with corresponding entities from a knowledge base. Due to ambiguity of natural language additional information about the named entities is needed as context in order to enable disambiguation between several possible entity candidates. However, to enable semantic analysis of texts in different languages, the applied knowledge base should also provide multi-lingual labels. Unfortunately, most knowledge bases extracted from existing authority files do not fulfill these requirements.
DBpedia represents an entity hub within the Linked Data cloud, where semantic information is provided by Wikipedia article texts, page links and interlanguage links. Authority files and in particular knowledge bases based on these authority files would clearly benefit from properly linking their entities to DBpedia. In this talk, we introduce the research issue of cross-lingual entity mapping and propose new methods how to perform high precision mappings of existing authority files to DBpedia. Based on already existing mappings of authority files, such as e.g. persons, locations, or organizations, this contribution concentrates on the mapping of entities that are not assigned to one of these classes.

Mash-up for Book Purchasing

An everyday task in the library is to identify and purchase new books. In my talk I will present a mash-up which shows information about a specific book from different sources like the number of libraries holding the book already, price estimation and classifications. Our holding information, i.e. whether we already have the book or some professor has it on his shelf, is crucial here. We made some attempts to also handle our holdings information for e-book-packages or patron driven acquisition.
The mashup we created uses linked data as well as the Z39.50/SRU protocol for receiving up-to-date information from libraries and some web scraping methods. In the current implementation the importance of Linked Data is rather small, nevertheless, possible benefits from more linked data are discussed in the talk.
Additionally, I will show the use of Linked Data occurring from some newly published titles and how it is possible to link from almost every webpage to our holding information by using an adaption of a Greasemonkey script.

Lunch

13:45 - 15:30

Libraries and Beyond

AgriVIVO: A Global Ontology-Driven RDF Store Based on a Distributed Architecture

Valeria Pesce / Johannes Keizer / John FereiraGlobal Forum on Agricultural Research (GFAR) / Food and Agriculture Organization of the United Nations (FAO) / Cornell University, United States of America

The proposed talk will describe the rationale behind and the technical implementation of the AgriVIVO store and portal (www.agrivivo.net). AgriVIVO is an RDF-based and ontology-driven global search portal harvesting from distributed directories of experts, organizations and events in the field of agriculture. It is based on the VIVO open source semantic web application initially developed at Cornell University and now adopted by several institutional and cross-institutional research discovery projects in the US and beyond.
AgriVIVO builds on the VIVO ontological model - which includes standard ontologies like the geopolitical ontology for locations, the FOAF vocabulary for people and the BIBO ontology for publications, besides the specialized VIVO core ontology - and extends it to better suit the agricultural domain.
One of the objectives for this global directory is to provide authoritative data on people and organizations in the agricultural fields: the next steps will focus on disambiguation techniques and the adoption of universal identifiers, especially ORCID, to univocally identify experts and link them to publications, thus laying the ground for authority data on authors.
More in details, the talk will cover: the reasons for choosing VIVO over other solutions; the methodology used to extend the VIVO ontology; the workflows and technical solutions adopted for harvesting from different databases; the work done for harmonizing data coming from systems using different semantic organizations; the architecture of the system; URIs, RDF profiles and Linked Data.

The Hellenic Academic Libraries Association (HEAL-Link) offers to its members (50 academic and research libraries in Greece) a wide range of added-value services, including a union catalog with several millions of MARC-coded records. Recent activities of HEAL-Link aim at bringing its member libraries into the digital era.
Pioneering among HEAL-Link’s activities is the development of an academic content aggregator in the framework of a newly started project, aiming at the creation of 2,000 ebooks, available to higher education curricula. The vision is to create a semantic knowledge base comprising several thousands of autonomous learning objects, each of important educational value but, most importantly, with the ability to be linked to, combined, and reused, enabling composition of handbooks covering more courses.
From the point of view of the student or scholar, it is to his/her benefit to be able to discover and study, not from only one but from various sources, and to compile his/her own textbook out of selected resources. Faceted navigation and search among the learning objects as well as other linked elements on the web at a fine granularity level, by using thesauri and employing educational attributes, like the type and degree of interactivity, the intended user audience and the degree of difficulty, is of paramount importance in the above context.
In close collaboration with the NTUA Multimedia Communications & Web Technologies (MCWT) team, the HEAL-Link is actively pursuing research in the Linked Data domain, and specifically interoperability between conventional digital library systems and educational content repositories.

Linked Data for Libraries: Great Progress, but What Is the Benefit?

SWIB can testify, probably more than any conference series, to the steady progress over several years in the development of linked data for libraries. Progress that is accelerating as we are evolving from individual projects from national and other large libraries towards cooperations and global initiatives around standard approaches and vocabularies.
Richard will explore three such initiatives, BIBFRAME – The Bibliographic Framework Initiative from the Library of Congress to apply linked data principles to describe bibliographic resources - a potential replacement for MARC, that takes note of other standards such as RDA & FRBR; Schema.org - an approach, with the help of the W3C Group Schema Bib Extend, which Richard chairs, to apply the generic web vocabulary backed by the major search engines to the task of exposing bibliographic data in an easily consumable format, and; WikiData - The potential for the emerging data format that will underpin all Wikipedia, to represent bibliographic entities like any other.
The key question that needs to be addressed however is what is the purpose of these efforts, beyond being a cool thing to do. What are the core problems we are trying to fix? Could efficiencies emerge as linked data, and the principles behind it, start to enter library workflows? How could the future landscape appear for library linked data, and how will it assist the prime basic goal of libraries - to help the potential users of our resources find them and use them. What is the benefit of linked data for libraries?

Coffee Break

Ontology Engineering

BIBFRAME: Libraries Can Lead Linked Data

"There won't be any Semantic Web that any mortal can understand until librarians help build it! #bibframe #alamw13 #semanticweb #linkeddata"
(Uche Ogbuji of zepheira.com via twitter, 2013-01-27)
As a result of the increasing level of digital interconnection in the information world, the established formats used by libraries for exchanging data are no longer deemed fit for purpose. Therefore the Library of Congress launched the "Bibliographic Framework Transition Initiative" (BIBFRAME) in 2011, aimed at developing the existing formats into a durable format. The purpose is to assess the viability of the formats, bearing in mind the opportunities presented by semantic web technologies.
As a member of the "Early Experimenters Group", the German National Library is actively supporting the initiative as part of its efforts to internationalize the German standards. In the first phase of the BIBFRAME project the main objective of the German National Library is to contribute to the BIBFRAME point papers and to glean experience from prototypical conversion of the German National Library data into the BIBFRAME format. A high potential lies in developing the interoperability with communities from outside the library community and also commercial providers, so that applications can re-use and benefit from the valuable data. Therefore, among other points, the compatibility of the BIBFRAME-model with the Europeana Data Model (EDM) and schema.org is examined.
The presentation will give an overview of the BIBFRAME initiative, and provide some insights of the steps taken at the German National Library and in the context of the wider community.
www.bibframe.org

Building a National Ontology Infrastructure

The National Library of Finland, in collaboration with the Ministry of Finance and the Ministry of Education and Culture, has launched a project called ONKI which aims to build a national-level ontology service.
ONKI provides a stable, centralized ontology service that allows for the publication of ontologies, searching for them, and provides common ways of utilizing them through various interfaces. The immediate aim is to enable the use of ontologies in the information retrieval systems in, e.g., libraries, museums, archives, governmental bureaus and other public organizations. Through the use of common definitions and URIs for the concepts, the integration between various content becomes simpler. All the code we produce is open source.
The other focus of the ONKI project is building and refining the Finnish General Upper Ontology YSO and its Swedish counterpart ALLSO, which have been in use since the early 2000s. Now we wish to evaluate the current state of YSO and whether its hierarchy serves its envisioned usage. To this end we will compare it to similar work done in other countries and also conduct interviews with various user groups.
The final vision is to make YSO a national general upper ontology providing the upper level hierarchy as well as concepts that are common to all domains. This is then complemented by various more specific domain ontologies. We have implemented this structure in the form of KOKO, a combination of YSO and fifteen different domain ontologies ranging from agriculture to health to seafaring. This KOKO is already in use in, e.g., various museums as well as in pilot use in the national broadcasting company YLE.

On the Way to a Holding Ontology

Modelling bibliographic data in RDF is widespread among libraries, thanks to common ontologies such as the Bibliographic Ontology (BIBO) and several years of practice. Holding data, however, is rarely modelled in RDF due to the lack of matching vocabularies. For this reason a new holding data group was formed in April 2013 as part of the German Kompetenzzentrum Interoperable Metadaten (KIM) within the Deutsche Initiative für Netzwerkinformation (DINI). This talk will present activity and work results of this group, including problems raised and respective solutions. Similar to the related KIM group 'Titeldaten (title data)', the holding data group is developing a best practice guide for holding data in RDF. The goal is to figure out, which data best belongs to a holding description and how this data should be modelled as Linked Data. The result is aligned with related efforts such as the DAIA, BIBFRAME, and FRBR with a strict focus on holdings of libraries and similar institutions. The emerging holding ontology is not going to be another "über-model" but it will fit into the concept of micro-ontologies, where each of the small ontologies describes an independent domain of data.

Day 3 · 2013-11-27 Conference

So, you've converted your data and published it as beautiful Linked Data (LD), now what? As early adopters of Linked Data, libraries have an opportunity to reap the benefits, namely to consume and relate to information in this big, networked graph; to change how the library works with information. This begs the question: is your tech and organisation up to the challenge? Can you deal with changes outside your control without resorting to aggregation and shuffling large batches of records every night? Does your boss even care?
In (re)building the new LIBRIS catalogue backend and cataloguing interface we chose Linked Data as the primary technology, making for example MARC something that happens on the outside of the system, rather than being an integral part of it. This keynote will focus on the motivation, design and lessons learned while making Linked Data a first class citizen at the National Library of Sweden.

The "OpenCat" Prototype: Linking Public Libraries to National Datasets

Libraries are already part of the linked data. The next step is to make this data useful for public libraries in a long term perspective.
The BnF already displays bibliographic information according to the linked data principles, through data.bnf.fr, an open linked data project that gathers resources and information in pages about authors, themes, and works. Now the question is: how can local libraries take advantage of this project to improve their services to patrons?
To experiment a realistic and prospective use case, the BnF started OpenCat, a R&D project financed by the Ministry of Culture of France, together with the Public Library of Fresnes, and a software company, Logilab. Starting from the local library’s functional requirements, a prototype has been built.
Relying on permanent identifiers and structured data, it gathers:
- information from data.bnf.fr: FRBRized data, links to digital documents, illustrations,
- information from the local library like shelfmark, availability, links to subject headings,
- snippets, timelines and links to external resources such as conferences, biographical information from the Académie Française and other resources available on the linked open data cloud.
Thus new opportunities for cooperation between libraries and for creating a new business model among libraries are arising.
This open-source prototype can be seen online with a demo.
The next step is to try this prototype in other public libraries, with their own resources and to analyse how they can benefit by sharing resources on the Web, via OpenCat.

Coffee Break

Contributing to Europeana

Semantic Web Technology in Europeana

In this presentation we will discuss some technological and organisational challenges for implementing the semantic web vision in Europeana, focusing on metadata modeling, ingestion, and dissemination.
Our first effort was a new data model (EDM), which has been developed and extensively tested with many Europeana partners over the past years.
We have then spent major efforts into building an infrastructure, the Unified Ingestion Manager, supporting EDM, integrating a metadata mapping editor, and being able to harvest data from Europeana providers that refer to third-party Linked Data (thesauri, gazetteers,…). We have also experimented with semantic enrichment, linking objects to external sources: Geonames, GEMET, and DBpedia. With the results we were able to tune our search engine to perform semantic search functions, e.g., query expansion using hierarchies and translated labels.
Finally, we have experimented with new ways of disseminating data. We released a pilot Linked Data service. We also publish RDFa/schema.org mark-up on our web portal. This is still a work in progress, and we are involved in the W3C Schema Bib Extend Group to stay aligned with community expectations.
Working on the Europeana scale emphasizes the importance of many aspects beyond technology: cooperation, standardization, legal concerns, and more. Also, it shows that applying separate bits of the semantic web vision – as opposed to implementing the full technical stack at once – already brings benefits.

Specialising the EDM for Digitised Manuscripts

The “Europeana Data Model” (EDM) serves as a generic semantic umbrella for the integration of heterogeneous metadata schemas from many different knowledge domains. In order to capture and retain semantics of specific knowledge domains, the model allows the integration of specific application profiles. The project “Digitised Manuscripts to Europeana” (DM2E) devised one of the first specialisations for the EDM: The DM2E model is an application profile for schemas about handwritten manuscripts. The model integrates standards as diverse as MARC/XML, TEI, EAD or MAB2 and captures specific semantics of manuscript descriptions in the humanities. Thereby, rich semantic statements about cultural heritage objects can be delivered to Europeana and made visible for a broad audience. By explicitly defining classes for datasets and published data resources, the model additionally adheres to Linked Data principles. Resource descriptions are first-class members of the data model and can be used for later user annotations and provenance tracking.
In this presentation, we will provide an insight into the creation of the DM2E model: How can the EDM be specialised, which resources were needed for the manuscript domain and how can existing resources be reused? This can serve as guidance for the creation of other (specialised) vocabularies.

Application of LOD to Enrich the Collection of Digitized Medieval Manuscripts at the University of Valencia

In this presentation we describe a use case of consumption and integration of LOD into a productive library application at the University of Valencia (Spain). The library has an important collection of medieval manuscripts and early printed books that is being digitized at the moment. All digital materials are made available open access as part of the institutional repository. A book reader allows the users to navigate following the physical or logical structure of the work, to access its illustrations or to connect with related documents. In order to go a step forward, the aim of the project being described here was to enrich the experience of the reader by providing additional information extracted from LOD data sets. The idea was that the user could get biographical information about the authors, illustrators, copyist… when accessing the digital version of the books. For that purpose an application was developed to link the book viewer with the LOD sources containing related information. The presentation has two parts. Firstly, we do an analysis of the available LOD sets, the criteria used to select the data sources for the project, the problems faced by the consumer when discovering or searching for LOD sets, and finally the sources selected. In the second part, we describe the algorithm of the application that integrates data extracted from different sources into the book reader. As a test the application was applied to a collection of 92 manuscripts. We make an analysis of the available information and the degree of coverage of the data sources in relation to the number of authors available in the repository.

Lunch

Base Technology: The Web

ResourceSync for Semantic Web Data Copying and Synchronization

The web is dynamic, with resources being created, updated, and deleted.
Many applications that reuse web resources from remote sources require local copies to meet reliability and performance constraints. Commodity web search services use web crawling to harvest data. However, this alone may not provide low-enough latency, high-enough accuracy, or even be practical for particular datasets. The ResourceSync Framework
http://www.openarchives.org/rs introduces a set a capabilities, based on Sitemaps, that a source may implement to enable clients to copy and keep in-sync with its resources.
Semantic Web services are often built around local copies of harvested data collected and updated by ad-hoc means. ResourceSync provides a standard web-based way to enable such harvesting. It includes discovery mechanisms and supports a variety of use cases including different size, change frequency, and latency requirement regimes. The talk will give an overview of the framework and then focus on application to the Semantic Web.

From Strings to Things: A Linked Open Data API for Library Hackers and Web Developers

"Things, not strings" is a popular slogan pushed by Google in the context of its knowledge graph. The LODLAM community advocates this idea, in particular in the context of authority data. However, if we want users to catalog and search things, not strings, we need to make it easy to go from strings to things. At hbz, we are working on a LOD API in order to allow this for title and authority data - currently for the hbz union catalog (lobid-resources), the German ISIL registry (lobid-organisations), and the German integrated authority file (GND). The API serves JSON-LD over HTTP. JSON-LD is an attempt to make LOD more accessible to web developers who are not familiar with semantic web concepts. Providing a common HTTP API instead of a SPARQL endpoint or RDF dumps serves the same purpose: to make the data and advantages of LOD available to all web developers, and not to semantic web experts only. In this talk, we will present use cases for the API (like an auto-suggest functionality for authority data), and describe our implementation and the technology stack we use: metadata transformation to N-Triples with the Culturegraph Metafacture toolkit, enrichment and conversion of N-Triples to JSON-LD with Hadoop, indexing JSON-LD with Elasticsearch, and building a web API with the Play framework.

Repositories Enhanced

Enhancing an OAI-PMH Service Using Linked Data: A Report from the Sheet Music Consortium

The Sheet Music Consortium (http://digital2.library.ucla.edu/sheetmusic/) is a collaboration of universities and other sheet music repositories to publish an online metadata catalog of sheet music in their digital collections. Through utilization of Linked Open Data principles and standards we aim to improve the normalization of the Consortium's metadata, thereby expanding its impact and our ability to share it more widely and effectively both directly with our users and through automated systems. We outline the steps to take the Sheet Music Consortium metadata from its current incarnation as MODS XML files and publish it as linked open data.
The Consortium has developed a plan to publish trustworthy data records that are derived from both harvested and user-contributed metadata. First, we identified four metadata elements to target as sources for linked data. These elements are: names of creators, titles of songs, publishers, and subjects. Second, we concluded that despite the high level of attention this metadata has received in its creation and its inclusion in the Consortium, it still is not sufficiently normalized for export to a linked data standard. We will review our strategies for normalizing the metadata values. Third, we will discuss the various methods of employing linked data in the OAI-PMH context, including schema.org and RDF and RDFS. Lastly, we will present the results of a pilot project focusing on publisher information, to take metadata from the Consortium and present it as linked open data.
While this case study is focused on sheet music, the methods discussed are generally applicable in the context of harvested metadata.

Exposing Institutional Repositories as Linked Data - a Case-Study

An institutional publication data service such as Bielefeld University's PUB (http://pub.uni-bielefeld.de) is a natural starting point for a linked data network. Institutional repositories are no longer restricted to traditional publications. Some repositories have begun to include research data and their metadata. So does PUB. Since research data and publications are related, it is very natural to present this as linked data. By exposing this data and its relationships through standard interfaces, we have opened up PUB to the larger LOD cloud. This will overcome the restrictions of interoperability between repositories. The PUB software is implemented in the open source Perl framework LibreCat/Catmandu (http://github.com/LibreCat).
As a first step, we implemented a content negotiation mechanism, a widely used feature in the semantic web. All standard bibliographic formats are supported: OAI-DC, MODS, BibTeX, RIS, and also the JSON and YAML representations of our internal data structure. The RDF representation of the metadata is using the widely used vocabularies BIBO, DCTERMS and FOAF.
Future work will involve the linking of the PUB data to external resources, e.g. authority data of the German National Libary, the cross linking to ORCID, Europe PMC or arXiv. Another developing branch will be the interlinking of publications with research data, either contained in our own repository or in an external one. The contextualization of publications and research data in the University's structure will be feasible as soon as the administrative data is published as linked data using VIVO.