Querying graphs with data

Download

Date

Author

Metadata

Abstract

Graph data is becoming more and more pervasive. Indeed, services such as Social Networks
or the Semantic Web can no longer rely on the traditional relational model, as its structure
is somewhat too rigid for the applications they have in mind. For this reason we have seen a
continuous shift towards more non-standard models. First it was the semi-structured data in the
1990s and XML in 2000s, but even such models seem to be too restrictive for new applications
that require navigational properties naturally modelled by graphs. Social networks fit into the
graph model by their very design: users are nodes and their connections are specified by graph
edges. The W3C committee, on the other hand, describes RDF, the model underlying the
Semantic Web, by using graphs. The situation is quite similar with crime detection networks
and tracking workflow provenance, namely they all have graphs inbuilt into their definition.
With pervasiveness of graph data the important question of querying and maintaining it has
emerged as one of the main priorities, both in theoretical and applied sense. Currently there
seem to be two approaches to handling such data. On the one hand, to extract the actual data,
practitioners use traditional relational languages that completely disregard various navigational
patterns connecting the data. What makes this data interesting in modern applications, however,
is precisely its ability to compactly represent intricate topological properties that envelop the
data. To overcome this issue several languages that allow querying graph topology have been
proposed and extensively studied. The problem with these languages is that they concentrate
on navigation only, thus disregarding the data that is actually stored in the database.
What we propose in this thesis is the ability to do both. Namely, we will study how query
languages can be designed to allow specifying not only how the data is connected, but also how
data changes along paths and patterns connecting it. To this end we will develop several query
languages and show how adding different data manipulation capabilities and different navigational
features affects the complexity of main reasoning tasks. The story here is somewhat
similar to the early success of the relational data model, where theoretical considerations led
to a better understanding of what makes certain tasks more challenging than others. Here we
aim for languages that are both efficient and capable of expressing a wide variety of queries of
interest to several groups of practitioners. To do so we will analyse how different requirements
affect the language at hand and at the end provide a good base of primitives whose inclusion
into a language should be considered, based on the applications one has in mind. Namely,
we consider how adding a specific operation, mechanism, or capability to the language affects
practical tasks that such an addition plans to tackle. In the end we arrive at several languages,
all of them with their pros and cons, giving us a good overview of how specific capabilities of
the language affect the design goals, thus providing a sound basis for practitioners to choose
from, based on their requirements.