Rails 2.3, named_scope, destroy_all and callback confusion

Apr 30, 2010

Background

I was recently working on a Rails 2.3.5 application when I found myself
stumped on some perplexing behavior that arose when using a named scope,
the #destroy_all method and an after destroy callback fired
by an ActiveRecord observer. Let’s begin with the models. There are
Asset and User models. The assets table’s columns include the integer
file_size and timestamp delete_after. There
is a named scope, expired, which returns all assets for
which the value of delete_after is in the past. Assets
belong to users.

Users have many assets. The users table has a large integer column,
current_disk_usage, which holds the denormalized sum of
the file sizes of the assets belonging to each user. There is an
instance method, #recalculate_disk_usage!, which
triggers the User instance to update current_disk_usage
be recalculating the sum of the user’s assets.

There is an AssetObserver class which triggers
User#recalculate_disk_usage! any time an asset record is
created, updated or destroyed. (The implementation show below is
inefficient, and one workaround to the problem discussed in this article
was to make the observer a bit gentler on the database. That version can
be found at the bottom of the article.)

The problem

The application in question must periodically purge asset records which
have expired (where delete_after is in the past). This is
accomplished with a Rake task which sends #destroy_all to
the Asset.expired named scope.

task:purge_expired=>:environmentdoAsset.expired.destroy_allend

And this is where the confusion set in. Whenever I manually created,
updated or destroyed an asset the asset observer kicked in as expected
and the user’s current disk usage was updated accordingly. Whenever I
ran the purge_expired Rake task, the correct assets were
destroyed, but the user’s current disk usage would get set to zero. If I
manually created, updated or destroyed another asset the current disk
usage would once again be set to the correct, up-to-date value. It was
time to see what SQL was actually being run, so I opened up the log
file.

J’accuse! The DELETE statement on the first line removes
the record from the assets table, and the UPDATE on the
last line shows the user’s disk usage being set to zero. But what’s up
with the SELECT in the middle? It’s calculating the sum of
the file sizes of the user’s assets, but it’s limited the sum to those
assets which have expired – look at the first half of the
WHERE clause. For some reason the conditions imposed by the
Asset.expired named scope are filtering into the call to
user.assets.sum(:file_size). I took a look at the
implementation of ActiveRecord::Base.destroy_all for a
clue.

Not really an obvious answer, but it helped to steer my thinking.
Asset.expired returns an instance of
ActiveRecord::NamedScope::Scope. Sending
#destroy_all to that scope calls
ActiveRecord::Base.find(:all) which returns an array of
Asset instances. We then iterate over that array, sending
#destroy to each asset. For each asset, after
#destroy runs,
AssetObserver#after_destroy(asset) is called which calls
asset.user.recalculate_disk_usage! which in turn calls
asset.user.assets.sum(:file_size).

A workaround

Okay, but I still didn’t know why the expired named scope
conditions at the beginning of the process were showing up again when
calling user.assets near the end of the process. However I
had a hunch the call to ActiveRecord::Base.find had
something to do with it, so to test this hunch I modified the Rake task,
removing the call to destroy_all and instead iterating
over the scope directly.

Et voilà! I don’t know what destroy_all’s use of
ActiveRecord::Base.find does that leaks into later elements
of the destruction process. Bypassing destroy_all by
iterating over the named scope directly seems to fix the problem, and to
be honest, while the original version with destroy_all
looked more like idiomatic Rails, the use of each feels
more like idiomatic Ruby, i.e., purer. I think this may be a bug, but
I’d need to get deeper into the guts of Rails’ traversal across the
association to be sure. I’ve only encountered this problem on Rails 2.3.
I haven’t tested the Rails 3 release candidate yet.

A different workaround

But wait, there’s more. As I mentioned earlier, the original
implementation of AssetObserver isn’t efficient. Here’s the
problem: each time an asset record is destroyed, the database is issued
both SELECT SUM and UPDATE statements. The
latter is unavoidable, but is the former strictly necessary?
ActiveRecord provides models with #increment! and
#decrement! instance methods. They’d normally be used to
increment or decrement a counter by one, but they both take optional
second arguments which are the amount by which the counter
(current_disk_usage) should be changed. When the
observer’s callback is triggered, the observed object (in this case, an
asset) is passed as an argument to the callback method. Since the
callback is acting on a single object at the time, instead of telling
the user to calculate a sum of all remaining asset’s file sizes, we
could just increment or decrement the current disk usage by the file
size of the asset being changed (or in the case of updates, by the
difference between the new and old file sizes). Here’s a simple version
of how that would look:

By removing the SELECT SUM call we certainly won’t hit the
database as hard for each asset being removed, so perhaps this is a more
correct way of doing things.

As it happens, the application this problem comes from has very light
database use as it is. With only a couple hundred users, each with a few
dozen assets, usually not more than a couple dozen assets total expiring
each week, and reliably long window of inactivity each night (asset
expiration happens once per day), peppering the database with a few
dozen SELECT SUM statements isn’t a big deal. The use of
User#recalculate_disk_usage! also has the benefit of
being self-healing: if, somehow, the current disk usage value for a user
were to become corrupted, it would be automatically fixed the next time
#recalculate_disk_usage! was called. If instead of
simply triggering the user’s recalculating of disk usage,
AssetObserver becomes responsible for the actual values
being added or subtracted from current_disk_usage there
may be implications for possible data corruption (although that should
be pretty well mitigated by the asset activities being wrapped in a
single database transaction).