Background Jobs in Ruby on Rails

Short response time is critical for every web application. Time consuming operations and long-running tasks which require intensive computation often can not be processed immediately during the normal HTTP request/response cycle. Otherwise the application gets unresponsive soon. The solution is background job processing. If you want to keep your app fast and responsive, then it is recommendable to move those long-running tasks into background processes. After the job is placed in a background queue, the application can return a response immediately. Examples include:

pushing some data to a slow external service

pulling data from a slow external service

accessing a remote API (like posting something to Twitter)

number crunching tasks

managing large uploads or downloads

processing huge multi-media files

generating large pdfs or pdf reports

generating image thumbnails

sending of bulk email, newsletters or SMS

Many big sites use background jobs and job queues to process time consuming operations, for example Amazon, GitHub and Twitter, at least in early days. Amazon offers now a queue service named Amazon SQS. Chris Wanstrath has written a short history of using background jobs in GitHub. Twitter has tried various ways of using background processes, too.

In general, one can distinguish between different task storage and task execution forms. The task storage can be the database, which means persistence and durability, or a message queue (Amazon SQS, Websphere MQ, RabbitMQ, ..), which means high performance if the queue operates in memory. The task execution can be immediately, for example by an always running background daemon, or periodically, for instance by running a CronJob. For periodically reoccurring jobs which have to be executed at a certain time of day a rake task or a script/runner controlled by a cron job is a good solution. This is perfect for jobs which should be running once a day. If a task must be processed as soon as possible, usually some form of storage comes into play. The jobs can be stored in a message queue (MQ), in the database (DB) or not at all:

Now let us take a closer look at the different solutions to the same problem. Users prefer those options and choices which are simple to set up and simple to use. The most simple solutions are database-driven Job queues: Background Job (BJ) and DelayedJob (DJ). DelayedJob is so popular that is has also two similar clones, JobFu and Background-Fu. Even simpler is to use no database or queue at all.

No queues or task storage

Spawn

– small plugin for Rails to easily fork or thread long-running code blocks
– executes task in new background process by creating a new child process (Forking) or new thread (threading)

Background Job (BJ)

– stores jobs in persistent job queue table and processes the jobs of the table in the background
– simple and robust solution, but how you structured the jobs is largely up to you. jobs have no direct connection to Rails models or their methods, Bj runs jobs as command line applications. Bj just runs the ruby or bash scripts you specify. Good “general purpose solution”.
– you need to use script/runner or rake tasks commands to access Rails models, which will loads the entire
Rails environment for each job
– no daemon process for processing the jobs, workers are not persistent, only one background process started or signalled for each stored job

DelayedJob (DJ)

– Written by Tobias Lütke (Tobi), used by GitHub in the past
– needs the daemons gem to create a background daemon process
– stores jobs in persistent job queue table (“delayed_jobs”)
– you can turn any method call into a job to be processed later: a job is queued by calling send_later(method, params) on any object. You can also use custom job classes.
– the queue is processed by a rake task (rake jobs:work) or a script “delayed_job” which start and stops a daemon process to process the queue
– the daemon will check for queued background jobs every 5 seconds
– good documentation, many tutorials, for example here, here, here or here.

JobFu

– written by Jon Stenqvist
– needs the daemons gem to create a background daemon process
– similar to delayed_job (a delayed_job clone)
– stores jobs in persistent job queue table (“jobs”)
– you can turn any method call into a job to be processed later (using the “Backgrounded handler” syntax)

Message Queue Servers

Message queue servers are available in various languages, Erlang (RabbitMQ), C (beanstalkd), Ruby (Starling or Sparrow), Scala (Kestrel) or Java (ActiveMQ). A short overview can be found here

Sparrow

– written by Alex MacCaw
– Sparrow is a lightweight queue written in Ruby that “speaks memcache”

Starling

– written by Blaine Cook at Twitter
– Starling is a Message Queue Server based on MemCached
– written in Ruby
– stores jobs in memory (message queue)
– Ruby client: for instance Workling
– documentation: some good tutorials, for example the railscast about starling and workling or this blog post about starling

Kestrel

– written by Robey Pointer
– Starling clone written in Scala (a port of Starling from Ruby to Scala)
– Queues are stored in memory, but logged on disk

@Mirko I wanted to focus on simple solutions including DB job queues and message queues. Resque is based on Redis, an advanced key-value data store. Scalable key-value data stores and distributed hash tables (DHTs) are certainly a hot topic, which is also related to the NoSQL discussion. Maybe a good topic for another post.

You forgot http://www.simpleworker.com! Send your ruby jobs to the cloud for processing. No setup, super simple to use, run tons of jobs at the same time and even schedule jobs for later or recurring execution.

We just recently launched this so let us know what you think if you get a chance to try it (free).