3 ways to do eager loading (preloading) in Rails 3 & 4

You are probably already familiar with the method #includes for eager loading
data from database if you are using Rails and ActiveRecord.
But do you know why you someties get few small and nice SQL queries and sometimes
one giant query with every table and column renamed? And do you know
about #preload and #eager_load which can help you achieve the same goal?
Are you aware of what changed in Rails 4 in that matter? If not, sit down and
listen. This lesson won’t take long and will help you clarify some aspects
of eager loading that you might not be yet familiar with.

Let’s start with our Active Record class and associations definitions that we
are going to use throughout the whole post:

Rails 3

Typically, when you want to use the eager loading feature you would use the
#includes method, which Rails encouraged you to use since Rails2 or maybe even
Rails1 ;). And that works like a charm doing 2 queries:

Apparently #preload behave just like #includes. Or is it the other way around?
Keep reading to find out.

And as for the #eager_load:

User.eager_load(:addresses)# SELECT# "users"."id" AS t0_r0, "users"."name" AS t0_r1, "users"."email" AS t0_r2, "users"."created_at" AS t0_r3, "users"."updated_at" AS t0_r4,# "addresses"."id" AS t1_r0, "addresses"."user_id" AS t1_r1, "addresses"."country" AS t1_r2, "addresses"."street" AS t1_r3, "addresses"."postal_code" AS t1_r4, "addresses"."city" AS t1_r5, "addresses"."created_at" AS t1_r6, "addresses"."updated_at" AS t1_r7# FROM "users"# LEFT OUTER JOIN "addresses" ON "addresses"."user_id" = "users"."id"

It is a completely different story, isn’t it? The whole mystery is that Rails
has 2 ways of preloading data. One is using separate db queries to obtain the addtional
data. And one is using one query (with left join) to get them all.

If you use #preload, it means you always want separate queries. If you use
#eager_load you are doing one query. So what is #includes for? It decides
for you which way it is going to be. You let Rails handle that decision.
What is the decision based on, you might ask. It is based on query conditions.
Let’s see an example where #includes delegates to #eager_load so that there
is one big query only.

In the last example Rails detected that the condition in where clause
is using columns from preloaded (included) table names. So #includes delegates
the job to #eager_load. You can always achieve the same result by using the
#eager_load method directly.

We get an exception because we haven’t joined users table with
addresses table in any way.

Is this intention revealing?

If you look at our example again

User.includes(:addresses).where("addresses.country = ?","Poland")

you might wonder, what is the original intention of this code.
What did the author mean by that? What are we trying to achieve here
with our simple Rails code:

Give me users with polish addresses and preload only polish addresses

Give me users with polish addresses and preload all of their addresses

Give me all users and their polish addresses.

Do you know which goal we achieved? The first one. Let’s see if we can
achieve the second and the third ones.

Is #preload any good?

Our current goal: Give me users with polish addresses but preload all of their addresses. I need to know all addreeses of people whose at least one address is in Poland.

We know that we need only users with polish addresses. That itself is easy:
User.joins(:addresses).where("addresses.country = ?", "Poland") and we know
that we want to eager load the addresses so we also need includes(:addresses)
part right?

Well, that didn’t work exactly like we wanted.
We are missing the user’s second address that expected to have this time.
Rails still detected that we are using included table in where statement
and used #eager_load implementation under the hood. The only difference compared to
previous example is that is that Rails used INNER JOIN instead of LEFT JOIN,
but for that query it doesn’t even make any difference.

This is exactly what we wanted to achieve.
Thanks to using #preload we are no longer mixing which users we want to fetch
with what data we would like to preload for them. And the queries are plain
and simple again.

Preloading subset of association

The goal of the next exercise is: Give me all users and their polish addresses.

To be honest, I never like preloading only a subset of association because some
parts of your application probably assume that it is fully loaded. It might only
make sense if you are getting the data to display it.

What should we do when we only know at runtime about the association conditions
that we would like to apply? I honestly don’t know. Please tell me in the
comments if you found it out.

The ultimate question

You might ask: What is this stuff so hard? I am not sure but I think most ORMs
are build to help you construct single query and load data from one table. With
eager loading the situation gest more complicated and we want load multiple data
from multiple tables with multiple conditions. In Rails we are using chainable API
to build 2 or more queries (in case of using #preload).

Rails 4 changes

Rails now encourages you to use the new lambda syntax for defining association
conditions. This is very good because I have seen many times errors in that
area where the condition were interpreted only once when the class was loaded.

It is the same way you are encouraged to use lambda syntax or method syntax to
express scope conditions.

# Bad, Time.now would be always the time when the class was loaded# You might not even spot the bug in development because classes are# automatically reloaded for you after changes.scope:from_the_past,where("happens_at <= ?",Time.now)# OKscope:from_the_past,->{where("happens_at <= ?",Time.now)}# OKdefself.from_the_pastwhere("happens_at <= ?",Time.now)end

In our case the condition where(country: "Poland") is always the same, no matter wheter interpreted
dynamically or once at the beginning. But it is good that rails is trying to
make the syntax coherent in both cases (association and scope conditions)
and protect us from the such kind of bugs.

Now that we have the syntax changes in place, we can check for any differences
in the behavior.

User.includes(:addresses)# SELECT "users".* FROM "users"# SELECT "addresses".* FROM "addresses" WHERE "addresses"."user_id" IN (1, 2)User.preload(:addresses)# SELECT "users".* FROM "users"# SELECT "addresses".* FROM "addresses" WHERE "addresses"."user_id" IN (1, 2)User.eager_load(:addresses)# SELECT "users"."id" AS t0_r0, "users"."name" AS t0_r1, "users"."email" AS t0_r2, "users"."created_at" AS t0_r3, "users"."updated_at" AS t0_r4,# "addresses"."id" AS t1_r0, "addresses"."user_id" AS t1_r1, "addresses"."country" AS t1_r2, "addresses"."street" AS t1_r3, "addresses"."postal_code" AS t1_r4, "addresses"."city" AS t1_r5, "addresses"."created_at" AS t1_r6, "addresses"."updated_at" AS t1_r7# FROM "users"# LEFT OUTER JOIN "addresses"# ON "addresses"."user_id" = "users"."id"

Well, this looks pretty much the same. No surprise here.
Let’s add the condition that caused us so much trouble before:

User.includes(:addresses).where("addresses.country = ?","Poland")#DEPRECATION WARNING: It looks like you are eager loading table(s)# (one of: users, addresses) that are referenced in a string SQL# snippet. For example:## Post.includes(:comments).where("comments.title = 'foo'")## Currently, Active Record recognizes the table in the string, and knows# to JOIN the comments table to the query, rather than loading comments# in a separate query. However, doing this without writing a full-blown# SQL parser is inherently flawed. Since we don't want to write an SQL# parser, we are removing this functionality. From now on, you must explicitly# tell Active Record when you are referencing a table from a string:## Post.includes(:comments).where("comments.title = 'foo'").references(:comments)## If you don't rely on implicit join references you can disable the# feature entirely by setting `config.active_record.disable_implicit_join_references = true`. (# SELECT "users"."id" AS t0_r0, "users"."name" AS t0_r1, "users"."email" AS t0_r2, "users"."created_at" AS t0_r3, "users"."updated_at" AS t0_r4,# "addresses"."id" AS t1_r0, "addresses"."user_id" AS t1_r1, "addresses"."country" AS t1_r2, "addresses"."street" AS t1_r3, "addresses"."postal_code" AS t1_r4, "addresses"."city" AS t1_r5, "addresses"."created_at" AS t1_r6, "addresses"."updated_at" AS t1_r7# FROM "users"# LEFT OUTER JOIN "addresses" ON "addresses"."user_id" = "users"."id"# WHERE (addresses.country = 'Poland')

Wow, now that is quite a verbose deprection :) I recommend that you read
it all because it explains the situation quite accuratelly.

In other words, because Rails does not want to be super smart anymore and
spy on our where conditions to detect which algorithm to use, it expects
our help. We must tell it that there is condition for one of the tables.
Like that:

I was wondering what would happen if we try to preload more tables but
reference only one of them:

User.includes(:addresses,:places).where("addresses.country = ?","Poland").references(:addresses)# SELECT "users"."id" AS t0_r0, "users"."name" AS t0_r1, "users"."email" AS t0_r2, "users"."created_at" AS t0_r3, "users"."updated_at" AS t0_r4,# "addresses"."id" AS t1_r0, "addresses"."user_id" AS t1_r1, "addresses"."country" AS t1_r2, "addresses"."street" AS t1_r3, "addresses"."postal_code" AS t1_r4, "addresses"."city" AS t1_r5, "addresses"."created_at" AS t1_r6, "addresses"."updated_at" AS t1_r7,# "places"."id" AS t2_r0, "places"."user_id" AS t2_r1, "places"."name" AS t2_r2, "places"."created_at" AS t2_r3, "places"."updated_at" AS t2_r4# FROM "users"# LEFT OUTER JOIN "addresses" ON "addresses"."user_id" = "users"."id"# LEFT OUTER JOIN "places" ON "places"."user_id" = "users"."id"# WHERE (addresses.country = 'Poland')

I imagined that addresses would be loaded using the #eager_load
algorithm (by doing LEFT JOIN) and places would be loaded using
the #preload algorithm (by doing separate query to get them) but
as you can see that’s not the case. Maybe they will change the
behavior in the future.

Rails 4 does not warn you to use the #references method if you
explicitely use #eager_load to get the data and the executed
query is identical:

If you try to use the other queries that I showed you, they still work
the same way in Rails 4:

# Give me users with polish addresses and preload all of their addressesUser.joins(:addresses).where("addresses.country = ?","Poland").preload(:addresses)#Give me all users and their polish addresses.User.preload(:polish_addresses)

Finally in Rails 4 there is at least some documentation for the methods,
which Rails 3 has been missing for years: