The case for document-oriented databases

Published on 28 July 2013 , last updated on 8 May 2014

Last week, I started working with AngularJS for a real-world project – I had played with Angular before, mainly to see how it differs from KnockoutJS which I had been using intensively before. This post is not about AngularJS itself (I may write one later on when I have a better perspective about the framework), I’ll just say that my first impressions about it are:

it lacks good documentation – the existing documentation doesn’t exactly have the right “feel”, and is incomplete

when coming from Knockout, Angular feels rather invasive, given that it is a “full-blown” framework rather than “just” a MVVM mapping tool

my current impression is that the learning curve looks like a roller-coaster – there’s moments of intense productivity interrupted with high amounts of “not getting anywhere, why on earth am I using this sh*t”

That being said, there’s one thing that AngularJS seems to get really right: it makes it possible to map documents to the UI without imposing its own design philosophy on the said document. With KnockoutJS for example, you’d have to make sure certain fields were provided as arrays, and generally had to know really well how the different bindings worked before knowing into what shape to bring your data so that the UI would do what you wanted.

Let’s say for example that you want to make it possible to add advertisements about wine on an application (who wouldn’t). To spice things up a bit, let’s say that the content should be multilingual. We’d have a model looking like this:

So our document mainly consists of a link and to multilingual content (in reality we may want to add more things to it, such as special characteristics, the price, etc., but let’s keep things simple for the sake of the example).

It is possible to write up the user-interface for the above model like this:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

<div class="row">

<h3>Wines</h3>

<div>

<div control-group label="Language">

<select id="languageCode"name="languageCode"ng-model="$parent.currentLang"ng-options="l.value as l.label for l in languages"></select>

the control-group and validation-error-for directives are custom directives to render boilerplate Twitter Bootstrap) markup, inspired by this article

the ng-options directive fetches the languages and displays them appropriately in the language drop-down. The languages are made available in the root scope of the application and have as value [{value: "en", label: "English"}, {value: "fr", label: "French"]

What I find really interesting in comparison to KnockoutJS is the possibility to bind values against a map: using the binding expression wine.translations[$parent.currentLang], we can dynamically navigate the document hierarchy without having to add any special lookup code. Of course, when e.g. creating a new document, we need to provide a scaffold for those documents, where translation would be an empty object, but this is a small price to pay in comparison to the flexibility this provides in regards to data design.

So what does this all have to do with document-oriented databases? Well, let’s see what we need to represent the above object in a relational database, using e.g. Slick. Given that we have multi-lingual content, we have several options as to how to model it, let’s go for an approach where each entity requiring multi-lingual content gets its own translation table. Using lifted embedding, we now get the following:

I think that by now you may guess what I’m getting at: these mapping gymnastics to go from a hierarchical document to a relational database are extremely verbose. Let’s see in contrast how the same would look like by putting things into e.g. MongoDB using the Salat case class mapper.

First, the data definition:

1

2

3

4

5

6

7

8

9

10

11

12

13

caseclassWine(

id:Option[ObjectId],

url:String,

translations:Map[String,WineTranslation]

)

caseclassWineTranslation(

title:String,

description:String

)

objectWines extendsSalatDAO[Wine,ObjectId](wineCollection)

Next, the controller code:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

def wine(id:ObjectId)=Action{implicit request=>

Wines.findOneById(id).map{wine=>

Ok(Json.toJson(wine))

}

}

def update(id:ObjectId)=Action{implicit request=>

Wines.findOneById(id).map{existing=>

request.body.asJson.flatMap{body=>

Json.fromJson[Wine](body).asOpt.map{updated=>

Wines.save(updated)

Ok

}

}getOrElse{

BadRequest

}

}getOrElse{

NotFound

}

}

Much simpler, isn’t it?

I think what this boils down to is the following: AngularJS and other modern client-side frameworks are built around the fundamental concept of manipulating and working with hierarchical documents. Relational databases, on the other hand, aren’t all that great at representing hierarchies – those need to be turned into relations, which are fundamentally different things. Of course, there is a way to get from one paradigm to the other, however, as we’ve seen, it is up to the developer to do the mapping from one to the other. In the Java space, this mapping has traditionally been done using ORMs such as Hibernate, however, these tools bring another set of problems with them, as soon as the data model gets bigger.

I’m pretty sure that I’m doing some things the wrong way in regards to Slick; the query above can probably be simplified somewhat. Yet the issue remains that overall, representing a document in a relational database does require a non-trivial amount of mapping effort.

On the other hand, relational databases are (still to this day) percevied as safe and easy to operate, and the hype that came with the NoSQL movement has essentially focused on how fast and scalable those technologies are. And yet, I think that at the end of the day, there’s only a few players that really require the speed and scalability that’s been hyped so much, and a lot more that could benefit from using document stores for their “document” aspect.