Hacking a subselect in ActiveRecord

This week, Damon and I were doing a performance optimization for some slow queries. The most performant solution involved denormalizing some data into a join table and doing a subselect to get the ids of the records we wanted. Not rocket science, but also a bit ugly to construct the SQL by hand. Our solution was to cheat a tiny bit and use an ActiveRecord internal method to generate the SQL for us.

That code uses the private method Favorite.construct_finder_sql to generate the following SQL:

SELECT * FROM posts WHERE posts.id IN (
SELECT post_id FROM favorites WHERE blog_id = 42 ORDER BY published_at DESC LIMIT 10 OFFSET 10
) ORDER BY published_at DESC

The Ruby may look like more code than the SQL, and in that form it is… but if you go the hack up a string route, once you start using string operations or interpolation to deal with the variable parts of the query it gets ugly pretty fast. Using the ActiveRecord code to put it all together keeps it nice and clean, and even makes sure things are sanitized and quoted properly too.

10 Comments

Nice technique! In general I would like to see more either in Rails core or in plugins that aide with SQL subqueries. There are many advanced situations (and some not so advanced) that really benefit from the use of subqueries. Even a lot of situations where a traditional join *could* be used are much more elegantly expressed using subqueries and since MySQL optimizes on-the-fly where necessary it usually doesn’t matter which way you specify it.

Of course if the ActiveRecord ORM had constructs that produced subqueries it wouldn’t work with all databases.

October 30, 2008 at 3:26 pm

Mark Wilden says:

If I understand correctly, this query is selecting rows from favorites according to some criteria, then selecting the matching rows from posts. I would’ve thought this would be solved by a simple join, not a subquery.

Also, why sort the subquery?

October 30, 2008 at 6:17 pm

Josh Susser says:

Mark: Using the excellent postgresql query plan explainer, we experimented with several different approaches. This is the one that worked best given the constraints of our schema and size of data (< 50K favorites, >1M posts). To be brief, doing a join has to look at a lot of rows in the posts table. The subselect means you can use an efficient index in one table to determine the posts you need, then selecting by id is very fast. Often a join is what you want, but one size doesn’t fit all.

October 30, 2008 at 6:26 pm

Mark Wilden says:

Hmm. A query of the form

SELECT * FROM favorites JOIN posts — etc.

should definitely not look at a lot of posts. It should find the favorites that meet the criteria (using whatever indexes it can), then match those few rows against the posts. Perhaps PostgreSQL was confused – maybe its statistics needed updating, I dunno. But a subquery should not be necessary for this. Of course, there’s often a difference between “should” and “is.” :)

October 30, 2008 at 8:03 pm

Josh Susser says:

Mark:

The difference between theory and practice: In theory there’s no difference, while in practice there is.

@brendan: I did something similar to allow using belongs_to associations in dynamic finders, but punted when I ran into dealing with polymorphic associations. Did you manage to work that out? I couldn’t tell from a rather quick inspection of your code.

August 18, 2009 at 1:46 pm

ravi says:

What does the construct_finder_sql() function look like? Does it simply output a SQL string?

construct_finder_sql is no longer necessary since Rails 3. All ActiveRecord/Arel query objects expose a public #to_sql method, which does the same thing (so you can do User.where(id: 1).to_sql now).

Further, in Rails 4, you don’t need to_sql to build subqueries. If you pass an ActiveRecord::Relation into a conditions hash, it becomes a subquery — so Posts.where(user_id: User.where(active: true)) works. (Of course, in that example, you’d actually want a join instead, but you get my point.)