Stardog & MongoDB

By Jess Balint, 19 Jun 2018 · 4 minute read

JSON silos are a liability, too, so as a first step we’re adding support for
virtual graphs over MongoDB.

Integrating JSON Data

Data source heterogeneity continues to increase and includes non-relational data
models. Stardog presents a unified view over all sources of data irrespective of
their native data model, easing the pain of unifying and querying the data. The
rise of MongoDB and subsequent push for JSON feature parity from relational
databases demonstrates the prevalance of storing data as JSON. We recognize the
demand and we’re building support for virtual graphs over MongoDB.

If you’ve worked with virtual graphs in Stardog, then you know how easy it is to
map a graph view of a relational database using Stardog Mapping
Syntax. We’ve taken SMS
a step further for creating RDF graph mappings of JSON document collections such
as those managed by MongoDB.

Mappings

To extend SMS to JSON, we merely need to convey the structure of the underlying
data and how it maps to RDF. There’s no jargon or new vocabulary full of obtuse
concepts. First, let’s look at how to specify the structure of the source data.
Since it’s JSON, we use a JSON template which clearly reflects the structure of
the document:

We use the template to bind variables based on the structure of the JSON
document. The root key here (“movies” in this example) specifies the collection name in MongoDB that we’re
querying over. For elements in arrays, such as actor objects in the cast
array, we think of it as if there’s one set of bindings for each element in the
array.

In order to map the data to RDF, we simply state a set of triple patterns
representing the RDF structure using the variables bound in the JSON template:

Queries

Virtual graphs in Stardog can be used just like physical graphs and are
compatible with features such as machine learning, reasoning, named graph
security, and path queries. Once we’ve loaded our movies virtual graph, we can
query it in the normal way:

Here we’re querying the movies data source for movies starring George Clooney.
We join that with some data stored in Stardog to restrict those movies to
ones with more than 10 million in box office sales.

Higher Level Views of Data

Mapping a single MongoDB data source to RDF is pretty useful. We can express a
much wider range of queries than in MongoDB directly. However, the real power
comes in building higher level views of data. Using Stardog’s reasoning
capabilities, it’s possible to define abstract relationships between properties
and classes.

For instance, we might want to build a new relationship between all actors that
starred in the same movie together. Using something as simple as this, we can
express queries such as “Six Degrees of Kevin Bacon”. See the Stardog docs about
path queries for more.

Using Stardog’s machine learning
capabilities, we can use data
stored in MongoDB directly as input to model training or combine it with another
data source. In the example query, we combined the movies data stored in MongoDB
with box office sales data stored in Stardog. We can use this query as input to
train a model which would predict box office sales given the set of actors in a
potential new movie.

Ultimately combining data in this way gives us a unparalleled view by
connecting isolated sources. In Stardog combining any number of data sources
can be done in a single SPARQL query. This means any combination of virtual
graphs over relational databases, MongoDB, other Stardog instances, etc.

Coming Soon

Ready to map your MongoDB databases into the knowledge graph? We’re putting the
finishing touches on the new feature and are busy with QA. If you’re interested in
beta access, please let us know.