Expressivity and Complexity of MongoDB (Extended Version)

A significant number of novel database architectures and data
models have been proposed during the last decade. While some of
these new systems have gained in popularity, they lack a proper
formalization, and a precise understanding of the expressivity and
the computational properties of the associated query languages. In
this paper, we aim at filling this gap, and we do so by considering
MongoDB, a widely adopted document database managing complex (tree
structured) values represented in a JSON-based data model, equipped
with a powerful query mechanism. We provide a formalization of the
MongoDB data model, and of a core fragment, called MQuery, of the
MongoDB query language. We study the expressivity of MQuery,
showing its equivalence with nested relational algebra. We further
investigate the computational complexity of significant fragments
of it, obtaining several (tight) bounds in combined complexity,
which range from LOGSPACE to alternating exponential-time with a
polynomial number of alternations. As a consequence, we obtain also
a characterization of the combined complexity of nested relational
algebra query evaluation.