When querying an Resource Description Framework (RDF) graph, a prominent feature is the possibility of extending the answer to a query with optional information. However, the definition of this feature in SPARQL—the standard RDF query language—has raised some important issues. Most notably, the use of this feature increases the... (more)

Privacy-preserving data publishing is an important problem that has been the focus of extensive study. The state-of-the-art solution for this problem is differential privacy, which offers a strong degree of privacy protection without making restrictive assumptions about the adversary. Existing techniques using differential privacy, however, cannot... (more)

NEWS

Updates to the TODS Editorial Board

As of January 1, 2018, five Associate Editors—Walid Aref, Graham Cormode, Gautam Das, Sabrina De Capitani di Vimercati, and Dirk Van Gucht—ended their terms, each having served on the editorial board for six years. Walid, Graham, Gautam, Sabrina, and Dirk have provided very substantial, high-caliber service to the journal and the database community. Also five new Associate Editors have joined the editorial board: Angela Bonifati, Université Claude Bernard Lyon 1, Wolfgang Lehner, TU Dresden, Dan Olteanu, University of Oxford, Evaggelia Pitoura, University of Ioannina, and Bernhard Seeger, University of Marburg. All five are highly regarded scholars in database systems.

Updates to the TODS Editorial Board

As of January 1, 2017, three Associate Editors, Paolo Ciaccia, Divyakant Agrawal, and Sihem Amer-Yahia, ended their terms, each having served on the editorial board for some six years. In addition, they will stay on until they complete their current loads. We are fortunate that they have donated their time and world-class expertise during these years. Also three new Associate Editors have joined the editorial board: Feifei Li, University of Utah, Kian-Lee Tan, National University of Singapore, and Jeffrey Xu Yu, Chinese University of Hong Kong. All three are highly regarded scholars in database systems.

Forthcoming Articles

A query Q in a language L has a bounded rewriting using a set of L-definable views if there exists a query Q' in L such that given any dataset D, Q(D) can be computed by Q' that accesses only cached views and a small fraction DQ of D.We consider datasets D that satisfy a set of access constraints, which are a combination of simple cardinality constraints and associated indices, such that the size |DQ| of DQ and the time to identify DQ are independent of |D|, no matter how big D is.
This paper studies the problem for deciding whether a query has a bounded rewriting given a set V of views and a set A of access constraints. We establish the complexity of the problem for various query languages L, from £3p-complete for conjunctive queries (CQ), to undecidable for relational algebra (FO). We show that the intractability for CQ is rather robust even for acyclic CQ with fixed V and A, and characterize when the problem is in PTIME. To make practical use of bounded rewriting, we provide an effective syntax for FO queries that have a bounded rewriting.
The syntax characterizes a core subclass of such queries without sacrificing the expressive power, and can be checked in PTIME. Finally, we investigate L1-to-L2 bounded rewriting, when Q in L1 is allowed to be rewritten into a query Q' in another language L2. We show that this relaxation does not simplify the analysis of bounded query rewriting using views.

We consider a data owner that outsources its dataset to an untrusted server. The owner wishes to enable the server to answer range queries on a single attribute, without compromising the privacy of the data and the queries. There are several schemes on practical private range search (mainly in Databases venues) that attempt to strike a trade-off between efficiency and security. Nevertheless, these methods either lack provable security guarantees, or permit unacceptable privacy leakages. In this paper, we take an interdisciplinary approach, which combines the rigor of Security formulations and proofs with efficient Data Management techniques. We construct a wide set of novel schemes with realistic security/performance trade-offs, adopting the notion of Searchable Symmetric Encryption (SSE) primarily proposed for keyword search. We reduce range search to multi-keyword search using range covering techniques with tree-like indexes, and formalize the problem as Range Searchable Symmetric Encryption (RSSE). We demonstrate that, given any secure SSE scheme, the challenge boils down to (i) formulating leakages that arise from the index structure, and (ii) minimizing false positives incurred by some schemes under heavy data skew. We also explain an important concept in the recent SSE bibliography, namely locality, and design generic and specialized ways to attribute locality to our RSSE schemes. Moreover, we are the first to devise secure schemes for answering range aggregate queries, such as range sums and range min/max. We analytically detail the superiority of our proposals over prior work and experimentally confirm their practicality.

It is common practice for data scientists to acquire and integrate disparate data sources to achieve higher quality results.
But even with a perfectly cleaned and merged data set, two fundamental questions remain: (1) is the integrated data set complete and (2) what is the impact of any unknown (i.e., unobserved) data on query results? In this work, we develop and analyze techniques to estimate the impact of the unknown data (a.k.a., {\em unknown unknowns}) on simple aggregate queries. The key idea is that the overlap between different data sources enables us to estimate the number and values of the missing data items. Our main techniques are parameter-free and do not assume prior knowledge about the distribution. Through a series of experiments, we show that estimating the impact of unknown unknowns is invaluable to better assess the results of aggregate queries over integrated data sources.

Abstraction without regret refers to the vision of using high-level programming languages for systems development without experiencing a negative impact on performance. A database system designed according to this vision offers both increased productivity and high performance, instead of sacrificing the former for the latter as is the case with existing, monolithic implementations that are hard to maintain and extend.
In this article, we realize this vision in the domain of analytical query processing. We present HLDB, a query engine written in the high-level programming language Scala. The key technique to regain efficiencyis to apply generative programming: HLDB performs
source-to-source compilation and optimizes the entire query engine by converting the high-level Scala code to specialized, low-level C code. We show how generative programming allows to easily implement a wide spectrum of optimizations, such as introducing data partitioning or switching from a row to a column data layout, which are difficult to achieve with existing low-level query compilers that handle only queries. We demonstrate that sufficiently powerful abstractions are essential for dealing with the complexity of the optimization effort, shielding developers from compiler internals and decoupling individual optimizations from each other.
We evaluate our approach with the TPC-H benchmark and show that: (a) With all optimizations enabled, our architecture significantly outperforms a commercial in-memory database as well as an existing query compiler. (b) Programmers need to provide just a few hundred lines of high-level code for implementing the optimizations, instead of complicated low-level code that is required by existing query compilation approaches. (c) These optimizations may potentially come at the cost of using more system memory for improved performance. (d) The compilation overhead is low compared to the overall execution time, thus making our approach usable in practice for compiling query engines.

Navigational queries over RDF data are viewed as one of the main applications of graph query languages, and yet the standard model of graph databases - essentially labeled graphs - is different from the triples-based model of RDF. While encodings of RDF databases into graph data exist, we show that even the most natural ones are bound to lose some functionality when used in conjunction with graph query languages. The solution is to work directly with triples, but then many properties taken for granted in the graph database context (e.g., reachability) lose their natural meaning.
Our goal is to introduce languages that work directly over triples and are closed, i.e., they produce sets of triples, rather than graphs. Our basic language is called TriAL, or Triple Algebra: it guarantees closure properties by replacing the product with a family of join operations. We extend TriAL with recursion, and explain why such an extension is more intricate for triples than for graphs. We present a declarative language, namely a fragment of datalog, capturing the recursive algebra. For both languages, the combined complexity of query evaluation is given by low-degree polynomials. We compare our language with previously studied graph query languages such as adaptations of XPath, regular path queries, and nested regular expressions; many of these languages are subsumed by the recursive triple algebra. We also provide an implementation of recursive TriAL on top of a relational query engine, and show its usefulness by running a wide array of navigational queries over real world RDF data, while at the same time testing how our implementation compares to existing RDF systems.