Handling the "N + 1 selects" problem in SQLAlchemy

The "N + 1" selects problem is one of the most common problems one might run
into when working on a database-backed web application. In this blog post, I'll
describe what this problem exactly is, how ORMs (Object Relational Mappers) come
into the picture, and how we can work around 2 variants of these problems when
using SQLAlchemy.

ORMs

Most web application frameworks these days provide some sort of an ORM to handle
database queries. ORMs can be a double-edged sword. On the one hand they take
care of a lot of common database-related issues. On the other hand, using an ORM
doesn't mean you can forget about how the underlying database actually works.

A personal rule of thumb is - if an ORM is not focusing on hiding the database
layer, it's most probably good to use.

In this blog post, we'll be using SQLAlchemy, which is an excellent database
toolkit and ORM for Python applications. It has a vast API - I've been working
with SQLAlchemy for a bit more than 5 years now and I can't recall a single
instance where it didn't solve a use case I had. On top of that, it does not
attempt to hide the database layer. Instead, it adds an abstraction layer on top
that makes all the database-interactions in Python applications much easier.

To me, SQLAlchemy feels very "Pythonic".

Description

Consider an application which serves as an online books catalog, something like
Goodreads. This most definitely involves having a books table, and an
authors table.

The two models are linked via the author relationship. This means that
accessing the author property on a Book object would return the associated
Author object. Since we've defined a backref, accessing the books property
for an Author object would return a list of Book objects that this author
wrote.

Variant 1 - loading objects

SQLAlchemy defines a bunch of strategies about how such related objects should
be loaded from the database. The default is set to lazy, which means fetching
a Book would not immediately result in fetching the Author.

To see this in action, the database engine needs to first be configured to echo
all the queries being executed.

This is what the N + 1 problem is. If related objects are lazy loaded, the ORM
has to execute extra queries to load them. And if the original resultset is
large (assume 100 objects), then this involves one additional query for each
original object.

Fortunately, SQLAlchemy lets you specify what kind of loading strategy to use,
all of which are extensively documented here.

We can ask SQLAlchemy to load related objects using a JOIN statement,
modifying the author relationship definition as follows.

Conclusion

Like I mentioned earlier, you need to be careful when using ORMs. They make life
a LOT easier, but you might still run into edge-cases like this when an innocent
property access can end up sending a lot more queries to your database.

Also, before ending the blog post, I need to mention that joined loading is
not the solution in all cases. Each and every loading strategy that
SQLAlchemy supports is good for some cases and bad for some others. Luckily, the
SQLAlchemy documentation on the different relationship loading techniques is
an excellent resource.