Running Background Jobs in Ruby on Rails

UPDATE: rails_cron is no longer available, and daemon_generator has moved. BackgrounDRB has gone through a major rewrite, and I’ve got a chapter on Background Processing in The Rails Way by Obie Fernandez. Thanks to Chris Johnson and Douglas F Shearer for the updated information.

Without a way to run long-running tasks, Heartbeat, our 2006 Rails Day Entry, wouldn’t have had a pulse.

Like Heartbeat, most web applications need to run regulary scheduled or long-running tasks at some point in their life-cycle. These tasks are often not inititated by a web request. How can you check the validity of a URL every 15 minutes? How do we get an eCommerce store to calculate the most popular items every 5 hours? How can we re-index our site for searching every day?

If you’ve ever had to do this, chances are you’ve used cron (the *nix tool used to schedule remote tasks) coupled with script/runner. However, wouldn’t it be great if you could maintain your tasks and background “jobs” inside the ruby language, or even better, as part of your Rails application?

Let’s explore two ways to do this: the excellent BackgrounDRb plugin by Ezra Zygmuntowicz, which was used to power Heartbeat, and the fabulous rails_cron by Kyle Maxwell.

h3. Why not just use cron jobs?

Any time you introduce a new piece to a system, there’s room for bugs. Adding pieces, especially ones that must be setup on individual platforms and deployment environments and in a “different” language compound and add complexity.

Also, Kyle Maxwell has this to say about cron:

when used with RoR, has the following shortcomings:

Significant startup resources required

Lots of RAM to run simultaneous processes

Hard to start/stop/affect the background processes from within Rails

rails_cron

rails_cron is “a way to execute background tasks using your Ruby on Rails environment.” Perhaps the closest replacement yet to external cron scripts, rails_cron provides an easy-to-use language for creating and maintaining tasks. Just look at this example link processor:

If you’ve ever had to play with crontab, and it’s myriad of 0 1 0 0 0 columns, then this is a breath of fresh air.

rails_cron stores its tasks in the database and once it starts up using rake tasks for starting and stopping – it will poll the db every minute (overridable) for new tasks to execute. This is especially nice for changing a daily task to a weekly one on the fly, without having to edit a crontab.

BackgrounDRb

BackgrounDRb is a plugin – or rather, as Ezra puts it: “a small framework for divorcing long running tasks from Rails request/response cycle.”

The best example is in the case of a file upload or process that you wouldn’t want a user’s HTTP request to have to sit and wait for. When a user initiates a task that will take a while, you simple create a worker and backgrounDRb handles spawning a thread and keeping track of it’s job “key” or id for you. You can even update your own variable, such as a “progress” meter variable to poll the “worker” for current status.

After installing, the latest version even has a generator for creating a worker for you. Let’s generate a worker that will check a link every 30 seconds:

script/generate worker LinkProcessor

Now edit the newly created lib/workers/link_processor_worker.rb and add:

Now, starting up the background server and letting your workers run totally independent of a web request is a simple matter of running a small script/runner task (fired off by a rake task) to kick off your workers:

Gotchas

BackgrounDRb’s latest version has code to load the Rails environment, controlled by a configuration file. However, be careful when including classes or libraries that backgroundRB might not know about. In order to load your entire environment, you’ll need to load up all of Rails by changing a few lines of code in the script/backgroundrb/start task: