Auskunft zu diesem Dagstuhl-Seminar erteilt

Dokumente

Press Room

Summary

The Semantic Web represents the next generation World Wide Web, where information is published and interlinked in order to facilitate the exploitation of its structure and semantics (meaning) for both humans and machines. To foster the realization of the Semantic Web, the World Wide Web Consortium (W3C) developed a set of metadata (RDF), ontology languages (RDF Schema and OWL variants), and query languages (e.g., SPARQL). Research in the past years has been mostly concerned with the definition and implementation of these languages, the development of accompanying ontology technologies, and applications in various domains. This work has been very successful, and semantic web technologies are being increasingly adopted by mainstream corporations and governments (for example by the UK and USA governments) and by several Science communities (for example, Life Sciences or Astronomy). Moreover, semantic technologies are at the core of future developments, e.g. in the UK Open Data Institute. However, compared to more traditional solutions, semantic technologies often appear to be immature, and current tools lag behind in terms of efficiently handling of large data sets. What are additionally needed include solid data management concepts, architectures, and tools that follow the paradigms of more traditional database (DB) and information retrieval (IR) systems.
Semantic data management refers to a range of techniques for the manipulation and usage of data based on its meaning. The aim of this workshop was to discuss in-depth a number of crucial issues, with particular emphasis on the fruitful exchange of ideas between the semantic web, database systems and information retrieval communities. Relevant key questions cutting across all topics covered were: (i) how can existing DB and IR solutions be adapted to manage semantic data; and (ii) are there new challenges that arise for the DB and IR communities (i.e. are radically new techniques required)?

For the purposes of this workshop, and for this report, we understand semantic data simply as data expressed in RDF, the lingua franca of linked open data and hence the default data model for annotating data on the Web. The workshop was organized along the following key themes:

Scalability: In order to make semantic technologies take on the targeted market share, it is indispensable that technological progress allows semantic repositories to scale to the large amount of semantic data that is already available and keeps growing. It is essential to come close to performance parity with some of the best DB solutions without having to omit the advantages of a higher schema flexibility compared to the relational model. Moreover, the exploitation of semantic data on the Web requires managing the scale that so far can only be handled by the major search engine providers. However, this should be possible without losing the advantages of a higher query expressivity compared to basic key-value stores and IR solutions.

Provenance. An important aspect when integrating data from a large number of heterogeneous sources under diverse ownership is the provenance of data or parts thereof; provenance denotes the origin of data and can also include information on processing or reasoning operations carried out on the data. In addition, provenance allows for effectively supporting trust mechanisms and policies for privacy and rights management.

Dynamicity. Another important property of many (semantic) data is its dynamicity. While some data, such as public administration archives or collections of text documents might not change too frequently, other data, coming from sensors, RSS, user-generated content (e.g, microblogging), etc., might evolve on a per millisecond basis. The effects of such changes have to be addressed through a combination of stream processing, mining, and semantics-based techniques.

Search and Ranking. The large and growing amount of semantic data enables new kinds of applications. At the same time, more data means that ultimately, there might be more results produced from it that one can or desires to inspect. Data and results to concrete information needs vary in the degree of relevance. Effective ranking mechanisms that incorporate the information needs as well as contextual information into account can deliver and rank pertinent results and help the users to focus on the part of the data that is relevant.