Writing Super Fast Queries in Rails

At work this week I had to speed up a background job that was clogging up our queue. This job aggregates data on records and posts to our Elastic Search index. It was suffering from all kinds of extra database calls. I had lots of fun working on this query! It’s so satisfying to make things fast.

Here’s a bit of what I learned about building SQL queries that can get tough with a typical ActiveRecord object.

A bit about the data

Let’s imaging we have a blog that has many users. Each user has many posts, and posts have many comments. Your database might look like this:

Pretty repetitive, right? If we’re building a summary of all of our the posts in our blog, we’re going to be doing a lot of unnecessary counting!

Using select

When I first tackled this problem, I thought, “Hmmm, maybe I can use ActiveRecord’s select to select a count for each period.” This does wonders for saving database queries! I’ll leave out the rest of the class here for brevity. Here’s what that query looks like:

# Use SQL to count the comments for each post.select=<<~SQLposts.id,Count(IF(comments.created_at>DATE_SUB(CURRENT_TIMESTAMP,90DAY),comments.id,NULL))ASninety_days,Count(IF(DATE_SUB(CURRENT_TIMESTAMP,30DAY),comments.id,NULL))ASthirty_days,Count(IF(DATE_SUB(CURRENT_TIMESTAMP,15DAY),comments.id,NULL))ASfifteen_daysSQL

We need a LEFT OUTER JOIN here because we want to be sure we get posts back even if they have 0 comments.

Also note the group here. Without this, we’d get 1 post record back for each comment and we’d end up with duplicates because posts have many comments!

Ok, great! We’ve solved the problem of counting the going back to the database to count the comments. Now we just do one query that returns a count for us.

Did you know that when you add SELECT … AS my_select, ActiveRecord will add a method for that attribute to the object returned? That’s why post.ninety_days works in the code above. I thought that was pretty handy.

I’m still not comfortable with this end result though. We’re loading records into ActiveRecord when all we need from them is the count data and the post id.

exec_query to the rescue!

exec_query returns a hash of the column names and values you asked for. This lets you skip active record entirely!

Yay! We got the same result and look at how little code it is! If you’re querying a big dataset, this will save to all kinds of time!

Wrapping Up

If you need to retrieve data from the database, but don’t need any of the functionality of your models, use exec_query to skip ActiveRecord and speed things up a bit.

If you do need ActiveRecord, then you can add additional attributes to the object you get back by passing SQL into the select method and naming giving it a name with AS.

For queries that have complex joins, or ones that you might need to build programmatically, relying on ActiveRecord might get difficult. Take a look at Arel, a library that forms the abstract syntax tree manager behind ActiveRecord, for situations like this.