Multi-threaded use of Rails ActiveRecord 3.0-3.1

An old post of mine on threading in Rails continues to get Google traffic, even though it was written about Rails 2.1 and is now terribly outdated.

I’ve recently been updating the application that uses threading with ActiveRecord to 3.0 or 3.1, so it’s time for an update.

The good news is that multi-threaded use of ActiveRecord is in many ways more supported and clearly documented; however, it’s still tricky writing for multi-threaded concurrency, and there are still some gotchas. There isn’t a lot of general overview material on this on google, so I hope this post can be of value, I’ve tried to provide a general overview of various issues and gotchas.

When I say “old rails” I mean Rails/Active Record 2.1, before the introduction of the ConnectionPool. When I say “new” or “current” Rails, I mean Rails 3.0 or 3.1 In fact, the ConnectionPool was introduced in Rails 2.2, so the general principles may apply starting then, however there have been continual tweaks to ConnectionPool for bugs, robustness, and performance, so using Rails 3.0 or 3.1 is recommended. Rails 3.2 does not yet exist at time of this writing, and may completely change the game yet again, if Aaron tenderlove goes forward with his ideas for a major refactor.

My use case

I have a Rails app. I do not want/need concurrent request handling. You can enable that with config.threadsafe!, which triggers config.allow_concurrency along with some other config values meant to make concurrency safer. But I’m not doing that, that’s not my interest, the usual Rails multi-process scaling model is fine for me.

What I do need though, is to kick off several threads which which will use ActiveRecord. I need to do this within a request-response loop, waiting on each of these threads with Thread#join before returning the response. I also want to kick off some threads that will happen in the ‘background’ — kick off threads in a controller action method, and not wait on them with Thread#join before rendering response. Those ‘background’ threads will store results in ActiveRecord, and my web pages poll with AJAX to get updated results from ‘background’.

These threads will be doing a bunch of HTTP calls to third-party servers, and also some ActiveRecord. Each thread is unique code, each one is a different ‘plugin’ for contacting external services via HTTP apis and storing the results in the database. The reason why I want threads here, is so when a thread is waiting on I/O (database, or more likely, a slow HTTP api), it can be scheduled out and another thread can do it’s thing. The ruby Global Interpreter Lock isn’t, in theory, a problem for my case, I’m fine with a process sticking to a single CPU core, I just need a thread waiting on external HTTP api (or database) to not lock up that CPU core while waiting.

I do need relatively low ‘latency’, I need all these ‘foreground’ async-ish tasks to start immediately and finish as fast as they can (sharing a CPU), and I need the ‘background’ tasks to also start pretty darn quick. (That is one thing that makes various ‘background job’ gems seem unsuitable. Another is that this is an app (soon to be Rails3 engine gem) that is used by third parties, and I’d really like to avoid requiring extra daemons to be started up other than typical rails app deployment).

So if your use case is at all like mine, this may be helpful to you. If your use case is very different than mine, a non-threaded solution may be better. If you want actual concurrent request handling, then that’s actually even a bit more complicated than what I’m describing here, although Rails theoretically supports it, I haven’t tried it.

My app is working. In fact, it was working even in Rails 2.1, but it works somewhat better in Rails 3. Here’s what I’ve discovered:

Use MRI 1.9.x, not MRI 1.8.x

MRI 1.8.x uses “green threads”, simulated threading entirely within the VM. While it sounds like an okay idea, experience (mine and others) show that ruby 1.8.x green threads behave poorly. Not only is thread-switching kind of expensive, but the threads do not share CPU very well, even with multiple threads waiting for CPU time, one thread could end up holding onto it for a couple seconds without switching out, in mysteriously unpredictable ways.

Ruby 1.9.x uses OS native threads, and the threads should behave more like you expect threads to.

Yes, even with MRI ruby 1.9.x, there is still the ruby “global interpreter lock”, which prevents a single ruby process from taking advantage of multiple CPU cores simultaneously, even with threads. However, I don’t think this means “concurrency is a myth” — there are still many use cases that threading will work for. If a thread is waiting on I/O (an http request, a database call), despite the GIL, ruby should still be able to switch it out and another thread in, returning the first thread to the CPU when it’s I/O is ready. This is sufficient for my use case, and probably many use cases. (Note “should”, see below on mysql vs mysql2).

jruby may be even better for threading, with possibly less overhead to thread creation and context switching and no GIL. For reasons not yet determined, i can’t get my app to run under jruby, but it’s definitely worth considering.

(The GIL does significantly lessen the attraction of using actual concurrent request handling in rails (config.threadsafe!, config.allow_concurrency), which is one of the reasons I don’t. The multi-process request-handling strategy typically used by rails applications is also quite sufficient for me, I have no problem with it. )

Use ‘mysql2’ adapter, never ‘mysql’

It turns out the original ‘mysql’ adapter would hold on to the ruby global interpreter lock (GIL) even when waiting for a database response. Even with the ruby GIL, in theory when one thread is waiting on a response from database, another thread ought to be switched in. However, the implementation of the ‘mysql’ adapter prevented this, making multi-threaded use of ActiveRecord a lot let useful with mysql.

Fortunately, the ‘mysql2’ adapter was written, fixing this problem. If planning on using concurrency with ActiveRecord and mysql, always use the ‘mysql2’ adapter, never use the ‘mysql’ adapter. ‘mysql2’ is now the recommended ‘standard’ in Rails, although it’s still called ‘mysql2’, and is a seperate adapter from ‘mysql2’.

If your database is not mysql, you might want to figure out if the ruby/rails database adapter is written in such a way to be concurrency friendly. It took a while for the community to catch on this was a problem with the mysql adapter code, could it still be a problem with other common adapters? I don’t really know. (It should not be a problem if you are using jruby with JDBC adapters).

Here is a little test script, cribbed from slide 4 in this presentation by Ilya Grigorik. It reveals that ‘mysql’ adapter indeed exhibits the behavior Grigork notes; however ‘mysql2’ adapter (based on code from Grigorik, I believe), does not — even using ‘mysql2’ adapter, however, we get a total elapsed real time of 2.1 seconds, which is quite a bit more than the 1-second-and-change one would expect. Apparently there is still quite a bit of overhead here, perhaps from thread context switching, or ActiveRecord::ConnectionPool checkout mutex, or the implementation of ‘mysql2’ itself, or something else I am not guessing. So multi-threaded ActiveRecord database access still may not perform as well as you’d like. But it’s good enough for my purposes, and way better than the pathological ‘mysql’ adapter.

sqlite3? Probably not. As sqlite3 doesn’t support SQL “sleep” function, this brain-dead script couldn’t be used to test it, you’d need an actual slow SQL query I haven’t had time to set up an environment for. However, googling leaves me very confused about sqlite3’s stance toward concurrency even without taking the ruby/rails adapter into account. It may depend on what options sqlite3 was compiled with (no idea what options ‘gem install sqlite3’ uses), as well as exactly how the ruby/rails adapters are written. Trying to use multi-threaded ActiveRecord concurrency with sqlite3 is probably not a great idea.

Pay attention to the ConnectionPool contract

In all rails, threads can not share ActiveRecord Connection objects — or share the underlying network connection encapsulated by the Connection object. Each thread needs it’s own connection to the database.

In Rails 2.1, each thread would end up opening up a new network connection to the database (and you had to take some extreme measures to make sure that connection ended up getting closed when the thread was finished with it). That’s obviously not great for performance. If you created lots of threads, you could also end up with a pretty huge number of open network connections to your database.

In ordinary Rails code, ActionPack kind of takes care of this all behind the scenes for you. But if you’re manually creating threads, you’ve got to think about when/how connections get checked out and checked back in to the ConnectionPool. Seriously, go read the documentation and consider the three methods/strategies it provides.

For manually creating threads in a use case resembling mine, the with_connection method seems just right. Make sure all ActiveRecord database access is wrapped in a with_connection block; this will make sure a database connection gets checked out from the pool, then checked back in at end of with_connection block.

Note, the documentation is a bit confusing, but in fact you don’t really need to deal with the yielded connection directly, you can just do ordinary ActiveRecord actions inside the block:

ActiveRecord::Base.connection_pool.with_connection do
m = SomeModel.find(something)
m.foo = "bar"
m.save!
etc
# and at end of block, connection will be checked
# back into pool.
end

Great, not that hard to work with, this is a reasonable API.

Note also that, while the documentation doesn’t mention it, you can do nestedwith_connection calls, and still only one connection will be used by the thread, without much extra overhead. This means that if you’re writing code in a context where you’re not sure if a parent scope already wrapped in with_connection or used some other means for checkout out a connection — you can still simply use #with_connection, and it’ll work fine either way.

Trick #1: Avoiding accidentally using first strategy

But what happens if you accidentally execute some database-communicating ActiveRecord logic in a thread outside of a with_connection block? Maybe you just missed a place, but this is also very easy to do because of the way ActiveRecord lazily loads association content and does other kinds of database access “on demand.”

If you accidentally trigger some database access outside of a with_connection block, now you’re using method #1 instead. And a connection will be automatically checked out of the ConnectionPool for you — and it’ll never be checked in, unless you do something about it. You’ll quickly run out of connections.

It would be nice if you could tell ActiveRecord, perhaps on a per-thread basis, not to allow this kind of implicit connection checkout (strategy 1). Here’s a really hacky monkey patch to ActiveRecord to support that. With this monkey patch applied, call ActiveRecord::base.forbid_implicit_checkout_for_thread! from a thread, and if that thread later tries to access an active record connection without explicit checkout (#with_connection, or #checkout), an ImplicitConnectionForbiddenError will be raised.

I don’t trust this hack enough to use it in production (I wrote it), but I use it in development/test just to try and find any accidental implicit checkouts. Apply monkey patch, call #forbid_implicit_checkout_for_thread! from each manually created thread, run your tests, you should get an ImplicitConnectionForbiddenError if you’re accidentally implicitly checking out a connection.

Trick #2: Clean up any accidental checkouts just in case

Just in case and out of paranoia, I periodically call:

ActiveRecord::Base.connection_pool.clear_stale_cached_connections!

This will look through the open database connection list, and clear any that were associated with Threads that are no longer alive. So if a thread accidentally does check out a connection without checking it in, this will clean it up. clear_stale_cached_connections! is in theory a bit expensive, as it needs to go through the complete list of threads in the VM to see which threads are still alive. However, it’s better than leaking connections, and I haven’t found it to be a problem, there are probably much bigger performance bottlenecks to multi-threaded access to ActiveRecord that make this one a non-issue.

Note that in Rails 2.1, there was advice to call a method #verify_active_connections! to do that same thing. Do notnot not ever call verify_active_connections! in current Rails. verify_active_connections! is not thread-safe in current rails. I can tell you from experience that if you do call it while you have multi-threaded ActiveRecord use going on, you will get weird errors, including (for the sake of people Googling, let’s list em):

Mysql2::Error: MySQL server has gone away

Mysql2::Error: Lost connection to MySQL server during query

ActiveRecord::StatementInvalid (Mysql2::Error: Malformed packet

threads and processes hanging in apparent deadlocks of some kind.

Occasionally an actual segmentation fault.

I can’t think of any time you’d both want to call verify_active_connections! and it would actually be safe to call, I’m not sure why this method even exists. But perhaps there are use cases I am not thinking of; but they won’t be when any kind of multi-threaded ActiveRecord use is going on.

If you are seeing errors like this, but you aren’t using verify_active_connections!, then you’re probably doing something else that violates ActiveRecord::ConnectionPool’s contract in a way that two threads end up sharing the same connection (which again, AR is not designed to support).

If I understand things right, you do not want to call clear_active_connections! in an app that makes multi-threaded use of AR.

Note on ConnectionPool size

The default ConnectionPool size is 5. You can, however, set it to whatever you’d like with the “pool:” key in your connection dictionary in your database.yml.

You’ll almost certainly need a larger pool size than that, and may very well need a larger ConnectionPool than you expect. I did.

It seems like if we wrap all our ActiveRecord access in with_connection, we ought to be able to have a lot more active threads than we have connections in the pool. Each thread will check out a connection just to do it’s work, and then check it back in. If a thread needs to work and no connection is available, it’ll wait for one, and since each thread is just checking out a connection temporarily for a relatively quick database call, it should get a connection long before the 5 second timeout. If you don’t have quite enough connections, things may slow down with threads spending time waiting for connections, but I wouldn’t expect a ConnectionTimeoutError, unless you have like a couple orders of magnitude more active threads than connections in the pool.

However, that wasn’t my experience. With 10 connections in pool and only 20 active threads using ActiveRecord via with_connection, I was still getting ConnectionTimeoutErrors. Why? I am not sure. Even in ruby 1.9.x, is ruby still terrible at thread context switching, or are ruby mutex’s used by ConnectionPool#checkout still crazy expensive? Even following ConnectionPool’s contract, was I somehow creating a deadlock, which ought not to be possible? Or is my code accidentally not following the contract correctly somehow? I really have no idea. But using 30 or 40 connections in my connection pool seems to be sufficient for 100 or more threads, but 10 connections is not for 20. Although take these exact numbers with a grain of salt, all I can say is you may need more connections than you expect.

Note on errors raised in a thread

This is not really ActiveRecord specific, but about ruby threads in general.

If a thread raises an exception, that thread will stop executing, and the exception will be ‘lost’ — until/unless another thread waits on that aborted thread with Thread#join. When Thread#join is called on a thread that raised an uncaught exception, the Thread#join directive will itself raise. The exception it raises will have the same class as the original exception, but it’s backtrace will be from the Thread#join call.

This is a bit annoying, you want that backtrace to debug the problem. With my hacky monkey patch for ImplicitConnectionForbiddenError you really need the backtrace to fix the problem; and I usually want to see the backtrace for a ConnectionTimeoutError too.

To deal with that, I wrap all thread logic in a rescue, and use Thread-storage to store the exception, retrieving it in the ‘master’ thread, like so:

threads = []
1.upto.whatever do
threads << Thread.new do
begin
# whatever stuff
rescue Exception => e
Thread.current[:exception] = e
end
end
end
threads.each do |thread|
thread.join
if thread[:exception]
# log it somehow, or even re-raise it if you
# really want, it's got it's original backtrace.
end
end

config.cache_classes

If you start threads that will keep going after a response is returned from a request-response loop, then you need to set “config.cache_classes = true”, even in development. Rails development-mode class reloading will unload/reload all your reloadable classes at the end/beginning of every request-response loop. But if you have code still working across request-response loop boundaries, that’s obviously a problem, a class will get unloaded while a thread is in the middle of trying to do stuff with instances of that class, bad.

So you need to give up on development-mode auto-reloading, and set `config.cache_classes = true` even in development.

If you are starting threads in an action, but waiting on them all with #join before returning the response, then it makes sense that you might be able to get away with `config.cache_classes = false` still, but I haven’t tried it.

None of the other concurrency-related config options that go along with config.threadsafe! seem to be required. Although I wouldn’t be surprised if there is an occasional edge case race condition involving autoloading with the default values for config.dependency_loading and config.preload_frameworks. But changing these to be ‘thread safe’ means more significant changes to your app, so I’ve decided just to risk it. So far so good.

thread.priority

In MRI 1.8.7, I found that I needed to set Thread.priority on threads that were not the main Rails request-response thread to a priority less than default, or the actual request-response thread would not actually ever get scheduled for the CPU. Theoretically, ruby 1.9.x (or jruby) with native threads should do a better job of thread scheduling, but I’ve left in my Thread.priority setters anyway, out of fear. :)

Consider NOT using multi-threaded access to ActiveRecord

So, you know, multi-threaded ActiveRecord use is a bit better and cleaner in current Rails than it was back in the Rails 2.1 days.

But it’s still kind of a pain; there are tricks and gotchas (not even counting the usual gotchas with any kind of concurrent programming, regardless of ActiveRecord), performance is somewhat unpredictable.

You probably want to consider other solutions than multi-threading.

The state of “async background job” processing options has gotten better than in the old days when BackgrounDRb was the best option. delayed_job seems to be a current popular choice, and is reportedly relatively easy to use and robust. rabbitmq was recommended in comments on another thread-related blog post, but I’m not sure what the best easy to use ruby glue for it is.

For my use case, I’m not sure background task approach would work out so well — I need a large number of concurrent workers, such that either I’d need a huge number of worker processes, or I’d need my worker processes to deal with mutli-threading in ActiveRecord anyway. Plus I have an app (now an engine gem) that’s distributed to other end-users, and I’d like to avoid dependency on extra external daemons if possible. But it’s quite possible things would work much better and I could save myself trouble by using a background task solution instead of threads. If it’ll work for your use case, it’ll probably be a lot less headache than threading.

And the new hot thing is of course EventMachine. I haven’t quite been able to wrap my head around how to use EventMachine for concurrency without totally re-writing my fairly large legacy app, which I don’t really want to use. But em-syncrony looks like perhaps it can be used to incorporate some event-machine based async operations in a Rails app that is otherwise fairly standard, without having to go crazy with callback-style code. lthough I still can’t quite wrap my head around how I’d use it in my architecture (in which third-party plugins need to be runnable in a concurrent context too). I suspect that an em-syncrony/fiber-based solution would perform a lot better than multi-threaded AR use. But it might be just as tricky to work with, you’ve still got to deal with the AR ConnectionPool, which em-syncrony activerecord actually patches to be fiber-aware (so one connection per fiber, instead of per-thread). It possibly could end up even more confusing, what with the extra abstraction to deal with. But if you’re interested in being cutting edge and can figure out how to make EM or em-syncrony work for you, definitely consider it.

The Future?

Current ConnectionPool is kind of weird, it’s got some weird hacks in it, it’s kind of tricky to understand how to use it in a multi-threaded environment (although so much better than pre-connectionpool days), it’s hard to use with fibers instead of threads (although em-syncrony seems to think it’s solved that, I definitely haven’t tried it, and am unclear what assumptions it may make about your use patterns), it’s performance seems unpredictable (if that’s the fault of my code, then we could say it makes it hard to understand how to write performant multi-threaded AR code).

So in theory I’m pleased about Aaron’s interest in re-working it. But I’ve spent so much time at this point figuring out how to get things to work barely-good-enough with the current architecture, if there’s a new architecture I really hope it’s not that different, or if it is, that it has enough real significant improvement to justify the cost of reworking my app again.

Acknowledge

Thanks very much to the author of coderr.wordpress.com, who is almost the only person one can find on the net apparently interested in multi-threaded ActiveRecord access, for his useful blog posts and surgically targetted rails patches. He has a couple blog posts, somewhat difficult to find on google actually, which were incredibly helpful to me, and some of the only things I can find explaining what’s going on and how to deal with it.

And of course, the ActiveRecord source. Thanks to api.rubyonrails.com for providing links directly to github for all documented methods. Without that, in the old days, it could be hard to even figure out what file a given method was implemented in; now with a click on a hyperlink, you’re looking at it (and can even use GitHub features to see how it’s changed over multiple Rails versions). And thanks to GitHub for providing an html interface that makes source code a first class citizen on the web, so web pages can link to such a powerful and useable interface for looking at source code. The ecology we’ve got based around github makes dealing with someone elses open source code so much easier than it used to be.

If you are looking for a good, threaded background worker solution check out my gem, girl_friday. My rule of thumb is that your application code should never do Thread.new. Let libraries manage that complexity for you.

I have a feature coming up that will require running a bunch of actions within a single database transaction. I would like to be running those actions in parallel if possible. But if threads can’t share a database connection, and database connections can’t share threads (http://api.rubyonrails.org/classes/ActiveRecord/Transactions/ClassMethods.html), I don’t see how this is possible. Any ideas for a workaround?

@Rafe Just some random ideas: I’m not sure what DataMapper’s concurrency semantics are with regard to connection sharing, they are not very well documented, but DataMapper might do better.

Otherwise, you might have to have each thread post it’s data back to a ‘main’ thread that actually does the db work?

Or, possibly better, stage the output in a temporary database location (or some storage that isn’t your rdbms at all, mongo, the file system, whatever), and then, once everything is complete, have another worker take the info from the temporary database location, and actually commit it to the ‘real’ database location in one transaction.

I think there’s going to be no way to have multiple threads actually _talking to the db_ in parallel in the same transaction, just as a result of how rdbms architecture works. Even if you had an ORM that shared the same underlying db connection with multiple threads, it would have to lock on actual db-communication, there’d be no way for the db communication to be happening in parallel in the same transaction/connection.

I had a similar problem (needing to query third-party services). I ended up writing a substantial portion of my app in node.js and proxying calls through to my existing Rails stack when needed. Frankly, a number of things about the entire RoR ecosystem (MRI GIL, assignment to class-shared state like ActiveResource.site, and even Ruby as a language compared to js) make it unattractive for situations where a more asynchronous event-oriented control flow would be better suited. Curious to hear more discussion on this topic.

For multithreaded ActiveRecord use, I have found two bugs in 3.0 and 3.2. One is a thread safety issue that at least on JRuby leads to errors. The other is a connection pool fairness issue that leads to connection timeouts even under light contention for database connections by multiple threads.

I think as written it still suffers from an error, but I have been unable to produce the error in my own tests: Imagine thread1 waiting on the queue. Thread0 adds two elements to the queue, waking up thread1, which has not yet re-acquired the lock. Now thread2 jumps in, sees thread1 is waiting, and does not steal but begins waiting. Thread1 re-acquires the lock and removes one element from the queue. At this point, thread2 is still waiting even though there is an available element in the queue. I guess it will timeout…

Possible solution is to allow thread2 to steal (breaking strict fairness, which I don’t really care about) iff there are enough available elements in the queue for all waiting threads plus one I’d love to hear your thoughts on that.

Actually, I didn’t fix ‘fairness’, I’ve been struggling with that for a while! i reduced the race condition slightly in rails 3-2-stable, but tenderlove didn’t want my fix in master at that time, and it wasn’t a complete solution anyway.

I _am_, I believe, having ‘fairness’ problems in my rails 3.2.x app — under load, in a race condition, a thread waits for a connection, and even though connecitons get checked back in, that waiting thread NEVER gets it, other threads jump in front in line and steal it — and eventually the original thread times out in it’s wait, even if it had been first-in/first-out, everyone could have been served.

I really am having this problem. I couldn’t figure out any way to fix it!

It looks like maybe @tenderlove _did_ commit your fix to master. If you think it’s not good enough anymore, I guess you might want to tell him to revert it or something, I dunno.

I don’t completely understand how your original fix was intended to work, can you help me understand? I’d like to try to help you refine it to something that really does work.

Although I think the requirements are:

1. As _close_ to ‘fair’ we we can get. If it’s not 100% always fair but still a lot more fair than current, that’s an improvement.
2. But threads should never wait _forever_; if they timeout without getting a connection when strict fairness would have given them one, that’s unfortunate but acceptable. But they should never wait _longer_ than their timeout. I believe master before your patch would sometimes have threads waiting _indefinitely_ with no upper bound, which is absolutely unacceptable.

Definitely interested in talking to you more about this, are you on freenode IRC ever?

I don’t understand how/if you’ve really fixed ‘fairness’ in either of your implementations.

Here’s the ‘fairness’ problem I’ve been running into:

If multiple threads are waiting on a Mutex or ConditionVariable, and #signal is sent — it’s unpredictable which thread will get in _first_. It may NOT be the one that’s been waiting the longest.

So you can have
thread1: waits
thread2: waits
thread3: waits
threadX checks in connection, signal
thread3 wakes up

Etc. And thread1 can be waiting forever. As you’re still using signal/wait, I don’t understand how your code solves this. Your code seems to make it such that _connections_ will be used such that the next connection used will be the OLDEST checked in connection. But that’s a kind of fairness I don’t think anyone cares about.