Abstract

The recently proposed Triple Pattern Fragment (TPF) interface aims at increasing the availability of Web-queryable RDF datasets by trading off an increased client-side query processing effort for a significant reduction of server load. However, an additional aspect of this trade-off is a very high network load. To mitigate this drawback we propose to extend the interface by allowing clients to augment TPF requests with a VALUES clause as introduced in SPARQL 1.1. In an ongoing research project we study the trade-offs of such an extended TPF interface and compare it to the pure TPF interface. With a poster in the conference we aim to present initial results of this research. In particular, we would like to present a series of experiments showing that a distributed, bind-join-based query execution using this extended interface can reduce the network load drastically (in terms of both the number of HTTP requests and data transfer).

Document

Additional Results

In addition to conducting the
DBpedia/FEASIBLE experiments presented in the extend abstract, we also performed the same type of experiments based on the Waterloo SPARQL Diversity Test Suite (WatDiv). That is, by using the same setup as described in Section 2 of the extended abstract, we ran a sequence of 145 WatDiv queries over the 10M triples WatDiv dataset. Given the measurements obtained by these runs, we generated the same type of charts as we show in the extended abstract for the FEASIBLE-based
measurements. The following document provides these charts.

To start the server in standalone mode, just edit the config.json file to point to the HDT file of the dataset (there is an example of such a configuration file in the source code package) and execute the following command:

java -server -Xmx4g -jar ldf-server.jar config.json

Combined TPF/brTPF Client

To run a sequence of SPARQL queries using the TPF-based query execution algorithm, copy the files with these queries (one query per file) into new directory, say mytestqueries, and execute the following command:

./eval.sh ldf-client-eval config.json mytestqueries

Similarly, if you want to use the brTPF-based query execution algorithm, execute the following command (the parameter --maxNumberOfMappings as used in the eval.sh script can be used to set the maxM/R value to be used for the query executions):

./eval.sh brTPF-client-eval config.json mytestqueries

During their execution, these command write to a CSV file called eval_TPF.csv and eval_brTPF.csv, respectively. These file will contain one line per query that was executed. The columns in these CSV files have the following meaning:
i) name of the file with the query,
ii) number of triple patterns,
iii) execution time in milliseconds until the first solution of the query result was available,
iv) number of HTTP requests issued until the first solution of the query result was available,
v) overall query execution time (in ms),
vi) overall number of HTTP requests issued during the query execution,
vii) overall number of triples received during the query execution,
viii) "TIMEOUT" marker if the query execution timed out,
ix) timeout threshold in minutes (if the query execution timed out).
The command for the brTPF client writes an additional CSV file, eval2_brTPF.csv, which provides statistics about the number of mappings that were associated with the brTPF requests sent during the query executions. That is, the first column provides the name of the file with the query, the second column indicates the number of requests without a mapping (i.e., these are ordinary TPF requests), the third column indicates the number of requests with exactly one mapping, the fourth column indicates the number of requests with exactly two mappings, etc.

Data and Queries

DBpedia and FEASIBLE Queries

For the experiments reported in the paper we used an RDF-HDT representation of the DBpedia 3.5.1 dataset. That is, we downloaded the dataset and converted it into the following HDT file:

To obtain the queries we used the FEASIBLE tool and generated a set of 1000 BGP queries. We filtered this set by removing all queries whose BGP consisted of a single triple pattern only (such queries would be executed in exactly the same way by both the TPF-based query execution algorithm and the brTPF-based query execution algorithm). From the remaining queries we selected uniformly at random the following set of 100 queries: