Give Codeship’s CI/CD Platform a Try

Want to learn more?

An article by Sarah Mei titled “Why you should never use MongoDB” discusses the issues you’ll run into if you try to use a NoSQL database when a relational database would be far superior. An example of this is when data that was thought to be in a silo needs to cross boundaries (what relational DBs are great at). Another example is when you store a user’s name in various places for easy access, but when the user updates their name you’re forced to find all of those places to make sure their information is up to date.

My experience making websites has been in line with this sentiment: Unless your data objects live in complete silos from one another (and you’re sure they will be that way for the foreseeable future), you’ll probably be better off using a relational database like Postgres.

Up until fairly recently, you had to make that difficult decision up-front when modelling your data: document or relational database? Yes, you could use two separate databases, using each tool for what they’re best at. However, you’d be increasing the complexity of your app and also of your development and server environments.

JSON support in Postgres

Postgres has had JSON support for a while, but to be honest it wasn’t that great due to a lack of indexing and key extractor methods. With the release of version 9.2, Postgres added native JSON support. You could finally use Postgres as a “NoSQL” database. In version 9.3, Postgres improved on that by adding additional constructor and extractor methods. 9.4 added the ability to store JSON as “Binary JSON” (or JSONB), which strips out insignificant whitespace (not a big deal), adds a tiny bit of overhead when inserting data, but provides a huge benefit when querying it: indexes.

With the release of version 9.4, JSON support tried to make the question “Do I use a document or relational database?” unnecessary. Why not have both?

I’m not going to argue that Postgres handles JSON as well as MongoDB. MongoDB was, afterall, specifically made as a JSON document store and has some pretty great features like the aggregation pipeline. But the truth is that Postgres now handles JSON pretty well.

Why would I even want JSON data in my DB?

I still believe that most data is modelled very well using a relational database. The reason for this is because website data tends to be relational. A user makes purchases and leaves reviews, a movie has actors which act in various movies, etc. However, there are use cases where it makes a lot of sense to incorporate a JSON document into your model. For example, it’s perfect when you need to:

Avoid complicated joins on data that is siloed or isolated. Think of something like Trello, where they can keep all information about a single card (comments, tasks, etc…) together with the card itself. Having the data denormalized makes it possible to fetch a card and it’s data with a single query.

Maintain data that comes from an external service in the same structure and format (as JSON) that it arrived to you as. What ends up in the database is exactly what the API provided. Look at thecharge response object from Stripe as an example; it’s nested, has arrays, and so on. Instead of trying to normalize this data across five or more tables, you can store it as it is (and still query against it).

Avoid transforming data before returning it via your JSON API. Look at this nasty JSON response from the FDA API of adverse drug events. It’s deeply nested and has multiple arrays — to build this data real-time on every request would be incredibly taxing on the system.

How to use JSONB in Postgres

Now that we have gone over some of the benefits and use-cases for storing JSON data in Postgres, let’s take a look at how it’s actually done.

Defining columns

JSONB columns are just like any other data type now. Here’s an example of creating a cards table that stores its data in a JSONB column called “data.”

Filtering results

It’s very common to filter your query based on a column, and with a JSONB column we can actually step into the JSON document and filter our results based on the different properties it has. In the example below our “data” column has a property called “finished”, and we only want results where finished is true.

Checking for column existence

Here we’ll find a count of the records where the data column contains a property named “ingredients.”

SELECT count(*) FROM cards WHERE data ? 'ingredients';

count
-------
1
(1 row)

Expanding data

If you’ve worked with relational databases for a while, you’ll be quite familiar with aggregate methods: count, avg, sum, min, max, etc. Now that we’re dealing with JSON data, a single record in our database might contain an array. So, instead of shrinking the results into an aggregate, we can now expand our results.

SELECT
jsonb_array_elements_text(data->'tags') as tag
FROM cards
WHERE id = 1;

tag
--------------
Improvements
Office
(2 rows)

There are three things I’d like to point out about the example above:

Two rows were returned even though we queried a single row from our database. This is equal to the number of tags that this row contained.

I used the jsonb form of the method instead of the json form. Use the one that matches how you defined the column.

I accessed the tags field using -> instead of ->> like before. -> will return the attribute as a JSON object, whereas ->> will return the property as integer or text (the parsed form of the attribute).

The real benefit of JSONB: Indexes

We want our application to be fast. Without indexes, the database is forced to go from record to record (a table scan), checking to see if a condition is true. It’s no different with JSON data. In fact, it’s most likely worse since Postgres has to step in to each JSON document as well.

I’ve increased our test data from five records to 10,000. This way we can begin to see some performance implications when dealing with JSON data in Postgres, as well as how to solve them.

As of Postgres 9.4, along with the JSONB data type came GIN (Generalized Inverted Index) indexes. With GIN indexes, we can quickly query data using the JSON operators @>, ?, ?&, and ?|. For details about the operators, you can visit the Postgres documentation.

How can I do this in Rails?

Let’s explore how to create tables with JSONB columns in Rails, as well as how to query those JSONB columns and update the data. For more information, you can refer to the Rails documentation on this subject.

Defining JSONB columns

First things first we need to create a migration which will create a table that has a column specified as JSONB.

Querying JSON data from within Rails

Let’s define a scope to help us find “finished” cards. It should be noted that even though the finished column is a JSON true value, when querying for it we will need to use the String true. If we look at the finished column in Rails we will see a TrueClass value, and it is also a JSON true value when viewing the data in psql, but despite that it will need to be queried using a String.

card.data["finished"].class
# TrueClass

Here is the code to add a :finished scope to our Card class. We won’t be able to use the regular where clause syntax that we’re used to, but will have to rely on a more Postgres specific syntax. It should be noted that the finished column needs to be wrapped in a String too, which is how you refer to JSON columns in Postgres.

You’ll notice that both Rails and Postgres can’t update just the single “finished” value in the JSON data. It actually replaces the whole old value with the whole new value.

Conclusion

We’ve seen that Postgres now contains some very powerful JSON constructs. Mixing the power of relational databases (a simple inner join is a beautiful thing, is it not?) with the flexibility of the JSONB data type offers many benefits without the complexity of having two separate databases.

You are also able to avoid making compromises that are sometimes present in document databases (if you ever have to update a reference to the user’s name in five different places, you’ll know what I’m talking about.) Give it a try! Who says you can’t teach an old dog some new tricks?

Subscribe via Email

Over 60,000 people from companies like Netflix, Apple, Spotify and O'Reilly are reading our articles. Subscribe to receive a weekly newsletter with articles around Continuous Integration, Docker, and software development best practices.

We promise that we won't spam you. You can unsubscribe any time.

Join the Discussion

Leave us some comments on what you think about this topic or if you like to add something.