Homo-Adminus Blog

Yet Another Admin’s Blog

Last few days one of our customers (one of the largest Ruby on Rails sites on the Net) was struggling to solve some really strange problem – once upon a time they were getting an error from ActiveRecord on their site:

They have innodb_lock_wait_timeout set to 20 seconds. After a few hours of looking for strange transactions we were decided to create s script to dump SHOW INNODB STATUS and SHOW FULL PROCESSLIST commands output to a file every 10 seconds to catch one of those moments when this error occurred.

Today we’ve got next error and started digging in our logs…

After a couple of hours of looking in logs, parsing binary logs from MySQL and thinking we’ve found a following image of the situation:

This transaction was locking a record and caused all those errors. We’ve took a look at project sources and didn’t found any transactions used there at all. So, it was obvious this problem transaction (with one query in it) was generated by ActiveRecord.

We’ve discussed this problem with our customer and started searching for a reason, why would AR wait for 37 seconds between a query and COMMIT command.

At the end we’ve found a reason. Customer uses acts_as_ferret for search on their site. Site has a huge documents database so ferret index updates became pretty slow. In acts_as_ferret index updates are called from a following AR hooks:

So, maybe we were reading AR docs in a wrong way, or there is no such info, but :after_update callback is called BEFORE the actual data COMMITed to the DB! In our customer's case, some user saved the document, ferret stated updating its indexes, record was locked for 30+ seconds and every updates to the same record (next tries to click to a Save button, for example) were failing with timeout errors.

So far so good, we're going to move to Sphinx and hope there will be no such crazy errors in future.