Abstract : Following the Linked Data principles, data providers have published
billions of RDF facts on the web. Anyone can retrieve some relevant
information from the Linked Data by executing SPARQL queries.
Such queries are useful in many domains including health or data
journalism. However, there is a trade-off between performances of
the queries and data availability when executing SPARQL queries.
In this thesis, we have investigated how the collaboration of data
consumers is opening new opportunities in this trade-off. More
precisely, how the collaboration of data consumers can improve
performances without degrading availability, or can improve
availability without degrading performances.
We consider that Linked Data can allow anyone to run a compact
mediator that executes SPARQL queries over data sources on the
web. The main idea is to connect these mediators together to build a
federation of Linked Data consumers. In this federation, each
mediator interacts with a subset of the network. Thanks to this
federation, we have built : (i) a decentralized cache hosted by
mediators. This client-side cache is able to handle a significative
part of subqueries and then improve data availability without a low
impact on performances. (ii) a delegation algorithm that allows
mediators to delegate their queries to other mediators. We have
demonstrated that delegation allows to run the workloads faster
when collaborating. This clearly improves performances without
degrading data availability.