CrEDIBLE thematic working days, October 2-4, 2013

Due to the increasing on-line availability of various biomedical data sources, the ability to federate heterogeneous and distributed data sources becomes critical to support multi-centric studies and translational research in medicine. The CrEDIBLE project organises 3 thematic working days in October 2-4 in Sophia Antipolis (near to Nice, France) where experts are invited to present their latest work and discuss their approaches. The aim is to gather scientists from all disciplines involved in the set up of distributed and heterogeneous medical data sharing systems (medical data representation, data mediation, data stores federation, data semantics, workflows, … towards biomedical data integration), to provide an overview of this broad and complex area, to assess the state-of-the-art methods and technologies addressing it, and to discuss the open scientific questions it raises.

The methods for biomedical data distribution considered in the context of CrEDIBLE are:

Federation: the (virtual) fusion of geographically spread data stores which should appear to end users as a unique and coherent data source.

Mediation: the semantic alignment of heterogeneous data sources, which were often designed independently from each other.

Querying: the description of distributed data sets, defined through data retrieval queries that apply on the whole federated system.

Data flow: the use and the enrichment of the federated data stores through the use of data processing pipelines.

Working days organisation

The idea of this workshop it to have groups of ~30 minutes presentations within a given theme/session followed by a time slot for a panel discussion to share the presenters experience on selected scientific questions and challenges. Talks should keep a balance between introducing the field (appropriate for a broad audience of scientists involved in all the areas covered by CrEDIBLE) and technical details (appropriate for expert scientists).

Thematic sessions

Session 1: Data repositories for secondary use of clinical and research data

Session goal: To report on concrete experience in developing systems gathering or indexing data to be shared and reused in research projects. User requirements, current technology limitations and future expectations.

Scientific questions:

Data indexing: how to meet the expectations of researchers in terms of precision of the vocabulary

Data provenance: what do we need actually: detailed models of provenance ? or more “distilled” information ?

Access control: how to accomodate multiple access policies specified by contributing entities ? is it manageable in practice ? should it be applied to datasets only (e.g. images, signals) or to metadata as well ?

Data federation: what level of data federation is required? What are the data sources to federate? What are the data models in use?

Session 2: Biomedical ontologies

Session goal: To discuss ontologies modeling observations and measurements data (designed to facilitate the sharing and reuse of scientific data)

Session 3: Data mediation

Reference model. Taxonomies or ontologies can be used as reference model. Is this the most appropriate reference model? What are the target models of the methods presented.

Query language. SPARQL can be used as query language to access data in heterogeneous databases. Is it the most appropriate query language? What are the query language applicable to the methods presented.

How to mediate various data sources (Pros and cons of each approach. Use cases. Are there hybrid approaches?):

Session 4: Data federation

Query language. Is SPARQL (v1.1) the most appropriate language? What is the trade-off between expressiveness and performance?

Performance. What is the performance impact? Gain of parallel execution of queries vs network overhead?

Especially when deploying over a WAN?

Scalability. How scalable are the different methods proposed? To what scale have they been tested?

Reliability. What is the impact of low reliability? Can queries be partially processed in case of communication failures with some data stores? Can end-users be notified on the kind of potentially missing information?

Session 5: Graphs and reasoning

Scientific questions:

How to process large RDF graphs? Storage in databases, scalability of graph processing algorithms, graphs indexing.

How can semantics described in ontologies be used to interpret RDF data? While the Web of data focusses on large data sets processing, the semantic Web involves costly reasoning processes. There is a trade-off to be found between the amount of data to process and the reasoning capabilities of the system.

Workshop conclusions

11:30, Johan Montagnat (CNRS, France)

Venue

The workshop will be held in the conference room, located at the ground floor of the I3S laboratory, in Sophia Antipolis, France. The I3S laboratory address is: Building “Algorithms B”, 2000 route des Lucioles, BP 121, 06903 Sophia Antipolis Cedex, FRANCE

Where is it ?

How to go there ?

By plane: "Nice Cote d'Azur" airport (NCE). There are two terminals (T1 and T2) with a free and frequent shuttle bus circling between the terminals.

Taxi are available from both terminals. A taxi from the airport costs approximately 65 euros.

During week days, from the airport T1 TAM bus number 230 (Sophia Express bus) is direct to Sophia Antipolis through the motorway (stop at the “INRIA” bus stop shown on the map). See the schedule.

On saturdays, Sundays and Holidays you need to take metropolitan TAM bus number 200 (towards Cannes) and stop at the Antibes train station bridge (“passerelle SNCF” bus stop) before connecting to one of the buses from Antibes to Sophia Antipolis (see below). The buses time tables are sparse on Sundays and Holidays.

The taxi or bus 230 ride is approximately 20 minutes out of rush hours.

By train: “Antibes” train station. From the train station there are 3 options:

Bus Envibus line number 11 is direct from Antibes train station (“SNCF” bus stop) to EPU building (“IUT” bus stop).

Bus Envibus line number 1 runs from the train station bridge (“passerelle SNCF” bus stop) to Sophia Antipolis (“IUT bus stop”). Be aware that there are 2 buses number 1 line ends: “Lycée Léonard de Vinci” (these ones stops before reaching Sophia Antipolis) and “Gare Routière de Valbonne - Sophia Antipolis” (take one of these).

Bus Envibus line number 9 runs from bus stop “Vautrin bas” (in the vicinity of the Antibes train station bridge, short walk on your left after crossing the bridge) to the “Belugues” bus stop in Sophia Antipolis (see map). Be aware that there are 2 buses number 9 line ends: “Lycée Léonard de Vinci” (these ones stops before reaching Sophia Antipolis) and “Gare Routière de Valbonne - Sophia Antipolis” (take one of these).