25.06.17 - 30.06.17, Seminar 17262

Federated Semantic Data Management

Motivation

Semantic data management refers to approaches that focus on manipulating and using data in terms of its meaning. A widely accepted foundation of such approaches is the graphbased data model defined by the Resource Description Framework (RDF). In addition to centralized access to RDF datasets, Webbased protocols such as the SPARQL protocol enable software clients to access or to query RDF data made available by remote servers. By integrating such remote data sources as members of a federated system, software clients may answer crossdataset queries without having to retrieve various datasets into a single repository. Given such a federation, the complexity of problems of semantic data management increases due to additional parameters such as variable data transfer delays, a changing availability of federation members, the size of the federation, and distribution criteria followed to place and semantically link data in different datasets of the federation.

The aim of this Dagstuhl Seminar is to gather experts from the Semantic Web and Database communities, together
with experts from application areas, to discuss indepth open issues that have impeded federated semantic data management approaches to be used on a large scale. Key questions cutting across all topics discussed are: (i) can traditional techniques developed for federations of relational databases provide effective and efficient solutions to problems of federated semantic data management; (ii) what problems of federated semantic data management present new research challenges that require the definition of novel techniques; and (iii) what is the role of semantics in the definition of the problems of federated semantic data management?

The seminar will focus on the following crucial topics related to federated semantic data management:

Graph data management techniques: Federations of semantic data expose graphstructures on various levels such as the graphs formed by relationships between entities in the data, and the graphs formed by semantically interlinked data sources. Therefore, graph data management techniques are candidates for addressing problems of federated semantic data management. The question is how such techniques can be applied to model and to manage the semantics in RDF data, taking into account that characteristics of the RDF data model (e.g., blank nodes and SPARQL operators) may affect tractability of graphbased tasks in a federation of RDF graphs.

Federated query processing techniques: Although federated query processing has been studied extensively, a number of important problems are still open, and more challenges are likely to come up as the complexity of federations increases. In particular, it requires source selection and query decomposition techniques, as well as query execution techniques, that are able not only to utilize the characteristics of the federation members and their datasets, but also to adapt to runtime conditions.

Access control and privacy: In applications of domains such as personalized medicine or finances, where federated semantic data management is common use, only authorized and privacyrespecting access is allowed. Solutions to the problem of modeling access control policies for Web resources have been benefitted from Semantic Web technologies. On the other hand, as per the Linked Data publishing principles, RDF properties associated with any resource can be accessed by dereferencing their corresponding URL. Thus, novel approaches are required to bridge the gap between accesscontrol models and unrestricted access of RDF resources.

Synchronization models: Similar to distributed databases, federations of RDF datasets may contain datasets that are replicated across federation members to increase data availability. However, Webaccessible RDF datasets may be updated autonomously. As a consequence, traditional, collaborationbased consistency models that enforce replica synchronization may not be suitable for federations of RDF datasets. An additional challenge is to support synchronization not only of explicit RDF data but also of implicit RDF statements.