Publishing Databases on the Semantic Web

Abstract

D2R Server is a tool for publishing the content of relational databases on the Semantic Web. This document details how three different kinds of Web agents can access this data through simple HTTP-based interfaces: RDF browsers, traditional HTML browsers, and SPARQL query clients.

Contents

1. Introduction

The Semantic Web is a global information space consisting of inter-linked data about resources. There are two access paradigms to the Semantic Web: browsing and searching. Using a Semantic Web browser like Tabulator (slides), a surfer can follow links from data about one resource to data about other resources. The second access paradigm is searching. Using the SPARQL query language and protocol, a client application can query data-sources for information about resources.

D2R Server is a tool for publishing the content of relational databases on the Semantic Web. Database content is mapped to RDF by a customizable mapping which specifies how resources are identified and which properties are used to describe resources. Based on this mapping, D2R Server allows a RDF representation of the database to be browsed and searched. The server provides two interfaces: The dereferencing interface allows instance and vocabulary URIs to be dereferenced over the HTTP protocol. The interface supports content-negotiation and serves RDF and XHTML representations of resources. The generated representations are richly interlinked on RDF and XHTML level in order to enable browsers and crawlers to navigate database content. The SPARQL interface enables applications to query the database using the SPARQL query language over the SPARQL protocol.

The server takes requests from the Web and rewrites them via a D2RQ mapping into SQL queries against a relational database. This on-the-fly translation allows clients to access the content of large databases without having to replicate them into RDF.

2.1 Dereferencing URIs Identifying Database Content

D2R Server allows database-generated URIs to be dereferenced. The HTTP request below, sent to the server http://www3.wiwiss.fu-berlin.de:2020, requests an RDF representation of the resource http://www3.wiwiss.fu-berlin.de:2020/resource/persons/6. Note that the request asks for content type application/rdf+xml.

GET /resource/persons/6 HTTP/1.0
Accept: application/rdf+xml

According to the httpRange-14 TAG finding, only information resources (i.e. documents) can
have representations served on the Web over HTTP. When
URIs that identify other kinds of resources, such as a person,
are dereferenced, then the HTTP response must be a 303
redirect to a second URI. At that location, a document describing
the real-world resource (i.e. person) is served. D2R Server implements this
behavior and will answer the request above with an HTTP response like this:

HTTP/1.1 303 See Other
Location: http://www3.wiwiss.fu-berlin.de:2020/data/persons/6
Connection: close

The client has to perform a second HTTP GET request on the
Location URI. D2R Server will respond now with an RDF/XML document
containing an RDF/XML description of the person:

The description is generated on the fly from the content of the database.
Note that the response contains URIs of related resources such as papers
and topics. Descriptions of these can be retrieved in the same way.
Beside of triples having resource/persons/6 as subject (out-arcs), the representation also contains triples having resource/persons/6 as object (in-arcs). Within our example, this enables RDF browsers and crawlers to follow the link from Andy to his paper resource/papers/4.

Future versions of D2R Server should also provide rdf:type and rdfs:label statements
for each referenced resource. This would leave a breadcrumb trail to help browsers decide
which links to follow.

2.2 Dereferencing External URIs

The database may also contain information about resources whose URIs are outside the server's namespace. When the server generates output that mentions such a resource, it adds an rdfs:seeAlso statement to the resource pointing at an RDF/XML document that contains all information from within the database about the external resource. By dereferencing the external URI and by following the rdfs:seeAlso link, an RDF browser can retrieve both authoritative as well as non-authoritative information about the resource.

2.3 Referring to Database Content from other Web Documents

You can use D2R Server's database-generated URIs to refer to database content from other Web documents. For instance, you could use the URI http://www3.wiwiss.fu-berlin.de:2020/resource/persons/6 in a foaf:knows statement within your FOAF profile to refer to Andy Seaborne. By dereferencing the URI, an RDF browser can navigate from your FOAF profile to information about Andy in the database.

Accessing a Database with an HTML Browser

D2R Server also offers a traditional HTML interface to the data. Each resource has an XHTML representation. Web browsers retrieve these representations by dereferencing the resource URI with an HTTP request that asks for HTML (content type text/html) or XHTML (application/xhtml+xml).

GET /resource/persons/4 HTTP/1.0
Accept: text/html

Like in the application/rdf+xml case, D2R Server will redirect to a document describing the resource, but this time an XHTML page:

The representation contains navigation links (Home | All Persons) that allow the complete content of the database to be browsed.

The <head> section of the HTML page contains a <link rel="alternate" /> tag pointing to the resource's RDF representation. This allows tools like Piggy Bank to switch between the HTML and RDF views. The RDF icon in the corner links to the same RDF document.

All pages are rendered from Velocity templates to allow customization. Future version of D2R Server might employ Fresnel lenses to improve resource display.