Published Version

Abstract

The Life Sciences Linked Open Data (LSLOD) Cloud is currently comprised of multiple datasets that add high value to biomedical research. The ability to navigate through these datasets in order to derive and discover new meaningful biological correlations is considered one of the most significant resources for supporting clinical decision making . However, navigating these multiple datasets is not easy as most of them are fragmented across multipleSPARQL endpoints, each containing trillions of triples and represented with insufficient vocabulary reuse. To retrieve and match, from multiple endpoints, the data required to answer meaningful biological questions, it is first necessary to catalogue the data represented in each endpoint, in order to understand how powerful queries traversing several SPARQL endpoints can be assembled. In this report, we explore the schema used to represent data from a total of 52 meaningful Life Sciences SPARQL endpoints and present our methodology for linking related concepts and properties from the pool of available elements. We found the outcome of this exploratory work not onlyto be helpful in identifying redundancy and gaps in the data, but also for enabling the assembly of complex federated queries. In this report we present three different approaches used to weave concepts and properties and discuss their applicability for creating complex links in the LSLOD cloud. Keywords: Linked Open Data, SPARQL, Life Sciences, Query Element .

Description

Conference paper

URI

Collections

This item is available under the Attribution-NonCommercial-NoDerivs 3.0 Ireland. No item may be reproduced for commercial purposes. Please refer to the publisher's URL where this is made available, or to notes contained in the item itself. Other terms may apply.