I'm a web developer in Norfolk. This is my blog...

1st August 2015 6:26 pm

Exploring the Hstorefield in Django 1.8

One of the most interesting additions in Django 1.8 is the new Postgres-specific fields. I started using PostgreSQL in preference to MySQL for Django apps last year, and so I was interested in the additional functionality they offer.

By far the biggest deal out of all of these was the new HStoreField type. PostgreSQL added a JSON data type a little while back, and HStoreField allows you to use that field type. This is a really big deal because it allows you to store arbitrary data as JSON and query it. Previously, you could of course just store data as JSON in a text field, but that lacked the same ability to query it. This gives you many of the advantages of a NoSQL document database such as MongoDB in a relational database. For instance, you can store different products with different data about them, and crucially, query them by that data. Previously, the only way to add arbitrary product data and be able to query it was to have it in a separate table, and it was often cumbersome to join them when fetching multiple products.

Let’s see a working example. We might be building an online store where products can have all kinds of arbitrary data stored about them. One product might be a plastic box, and you’d need to list the capacity as an additional attribute. Another product might be a pair of shoes, which have no capacity, but do have a size. It might be difficult to model this otherwise, but HStoreField is perfect for this kind of data.

First, let’s set up our database. I’ll assume you already have PostgreSQL up and running via your package manager. First, we need to create our database:

$ createdb djangostore

Next, we need to create a new user for this database with superuser access:

$ createuser store -s -P

You’ll be prompted for a password - I’m assuming this will just be password here. Next, we need to connect to PostgreSQL using the psql utility:

$ psql djangostore -U store -W

You’ll be prompted for your new password. Next, run the following command:

Note that HStoreField is not part of the standard group of model fields, and needs to be imported from the Postgres-specific fields module. Next, let’s create and run our migrations:

$ python manage.py makemigrations
$ python manage.py migrate

We should now have a Product model where the attributes field can be any arbitrary data we want. Note that we installed ipdb earlier - if you’re not familiar with it, this is an improved Python debugger, and also pulls in ipython, an improved Python shell, which Django will use if available.

Again, it only returns the box, as that’s the only one where that key exists. Note that all of this is tightly integrated with the existing API for the Django ORM. Let’s add a third product, a food hamper:

As you can see, the HStoreField type allows for quite complex queries, while allowing you to set arbitrary values for an individual item. This overcomes one of the biggest issues with relational databases - the inability to set arbitrary data. Previously, you might have to work around it in some fashion, such as creating a table containing attributes for individual items which had to be joined on the product table. This is very cumbersome and difficult to use, especially when you wanted to work with more than one product. With this approach, it’s easy to filter products by multiple values in the HStore field, and you get back all of the attributes at once, as in this example:

Similar functionality is coming in a future version of MySQL, so it wouldn’t be entirely surprising to see HStoreField become more generally available in Django in the near future. For now, this functionality is extremely useful and makes for a good reason to ditch MySQL in favour of PostgreSQL for your future Django apps.