Getting Started Scheduling Tasks with Celery

June 23, 2014

Getting Started Using Celery for Scheduling Tasks

Many Django applications can make good use of being able to schedule work, either periodically or just not blocking the request thread.

There are multiple ways to schedule tasks in your Django app, but there are some advantages to using Celery. It’s supported, scales well, and works well with Django. Given its wide use, there are lots of resources to help learn and use it. And once learned, that knowledge is likely to be useful on other projects.

Celery versions 3.0.x

This documentation applies to Celery 3.0.x. Earlier or later versions of Celery
might behave differently.

Introduction to Celery

The purpose of Celery is to allow you to run some code later, or regularly
according to a schedule.

Why might this be useful? Here are a couple of common cases.

First, suppose a web request has come in from a user, who is waiting
for the request to complete so a new page can load in their browser.
Based on their request, you have some code to run that's going to take
a while (longer than the person might want to wait for a web page), but
you don't really need to run that code before responding to the web
request. You can use Celery to have your long-running code
called later, and go ahead and respond immediately to the web request.

This is common if you need to access a remote server to handle the request.
Your app has no control over how long the remote server will take to respond,
or the remote server might be down.

Another common situation is wanting to run some code regularly. For
example, maybe every hour you want to look up the latest weather
report and store the data. You can write a task to do that work, then
ask Celery to run it every hour. The task runs and puts the data
in the database, and then your Web application has access to the
latest weather report.

A task
is just a Python function. You can think of scheduling a task as
a time-delayed call to the function. For example, you might ask Celery
to call your function task1 with arguments (1, 3, 3) after five
minutes. Or you could have your function batchjob called every
night at midnight.

We'll set up Celery so that your tasks run in pretty much the same
environment as the rest of your application's code, so they can access
the same database and Django settings. There are a few differences to keep
in mind, but we'll cover those later.

When a task is ready to be run, Celery puts it on a
queue,
a list of
tasks that are ready to be run. You can have many queues, but we'll assume
a single queue here for simplicity.

Putting a task on a queue just adds it to a to-do list, so to speak.
In order for the task to be executed, some other process, called a worker,
has to be watching that queue for tasks. When it sees tasks on the queue,
it'll pull off the first and execute it, then go back to wait for more.
You can have many workers, possibly on many different servers, but we'll
assume a single worker for now.

We'll talk more later about the queue, the workers, and another important
process that we haven't mentioned yet, but that's enough for now, let's
do some work.

Installing celery locally

Configuring Django for Celery

To get started, we'll just get Celery configured to use with runserver.
For the Celery broker, which we will explain more about later, we'll use a
Django database broker implementation. For now, you just need to know that
Celery needs a broker and we can get by using Django itself during development
(but you must use something more robust and better performing in production).

In your Django settings.py file:

Add these lines:

importdjcelerydjcelery.setup_loader()BROKER_URL='django://'

The first two lines are always needed. Line 3 configures Celery to use its
Django broker.

Important: Never use the Django broker in production. We are only using it
here to save time in this tutorial. In production you'll want to use RabbitMQ, or
maybe Redis.

Add djcelery and kombu.transport.django to INSTALLED_APPS:

INSTALLED_APPS=(...'djcelery','kombu.transport.django',...)

djcelery is always needed. kombu.transport.django is the Django-based
broker, for use mainly during development.

Create celery's database tables. If using South for schema migrations:

$ python manage.py migrate

Otherwise:

$ python manage.py syncdb

.

Writing a task

As mentioned before, a task can just be a Python function. However, Celery
does need to know about it. That's pretty easy when using Celery with Django.
Just add a tasks.py file to your application, put your tasks in that file,
and decorate them. Here's a trivial tasks.py:

fromceleryimporttask@task()defadd(x,y):returnx+y

When djcelery.setup_loader() runs from your settings file, Celery will
look through your INSTALLED_APPS for tasks.py modules, find the
functions marked as tasks, and register them for use as tasks.

Marking a function as a task doesn't prevent calling it normally. You
can still call it: z = add(1, 2) and it will work exactly as before. Marking
it as a task just gives you additional ways to call it.

Scheduling it

Let's start with the simple case we mentioned above. We want to run our task
soon, we just don't want it to hold up our current thread. We can do that by
just adding .delay to the name of our task:

frommyapp.tasksimportaddadd.delay(2,2)

Celery will add the task to its queue ("worker, please call myapp.tasks.add(2, 2)") and return
immediately. As soon as an idle worker sees it at the head of the queue, the
worker will remove it from the queue, then execute it:

importmyapp.tasks.addmyapp.tasks.add(2,2)

.

A warning about import names

It's important that your task is always imported and refered to using the
same package name.
For example, depending on how your Python path is set up,
it might be possible to refer to it as either
myproject.myapp.tasks.add or myapp.tasks.add. Or from
myapp.views, you might import it as .tasks.add. But Celery has no
way of knowing those are all the same task.

djcelery.setup_loader() will register your task using the package name
of your app in INSTALLED_APPS, plus .tasks.functionname. Be sure
when you schedule your task, you also import it using that same name, or
very confusing bugs can occur.

Testing it

Start a worker

As we've already mentioned, a separate process, the worker, has to be running
to actually execute your Celery tasks. Here's how we can start a worker for
our development needs.

First, open a new shell or window. In that shell, set up the same Django
development environment - activate your virtual environment, or add
things to your Python path, whatever you do so that you could use
runserver to run your project.

Now you can go to /admin/django/message/ to see if there are items on the
queue. Each message is a request from Celery for a worker to run a task.
The contents of the message are rather inscrutable, but just knowing if your
task got queued can sometimes be useful. The messages tend to stay in the
database, so seeing a lot of messages there doesn't mean your tasks aren't
getting executed.

Check the results

Anytime you schedule a task, Celery returns an AsyncResult object. You can
save that object, and then use it later to see if the task
has been executed, whether it was successful, and what the result was.

result=add.delay(2,2)...ifresult.ready():print"Task has run"ifresult.successful():print"Result was: %s"%result.resultelse:ifisinstance(result.result,Exception):print"Task failed due to raising an exception"raiseresult.resultelse:print"Task failed without raising exception"else:print"Task has not yet run"

.

Periodic Scheduling

Another common case is running a task on a regular schedule. Celery implements
this using another process, celerybeat. Celerybeat runs continually, and
whenever it's time for a scheduled task to run, celerybeat queues it for
execution.

For obvious reasons, only one celerybeat process should be running (unlike
workers, where you can run as many as you want and need).

Starting celerybeat is similar to starting a worker. Start another window,
set up your Django environment, then:

You can now add schedules by opening the Django admin and going to
/admin/djcelery/periodictask/.
See the image above for what
adding a new periodic task looks like, and here's how the fields
are used:

Name — Any name that will help you identify this scheduled task later.

Task (registered) — This should give a choice of any of your defined tasks, as long as you've started Django at least once after adding them to your code. If you don't see the task you want here, it's better to figure out why and fix it than use the next field.

Task (custom) — You can enter the full name of a task here (e.g. myapp.tasks.add), but it's better to use the registered tasks field just above this.

Enabled — You can uncheck this if you don't want your task to actually run for some reason, for example to disable it temporarily.

Interval — Use this if you want your task to run repeatedly with a certain delay in between. You'll probably need to use the green "+" to define a new schedule. This is pretty simple, e.g. to run every 5 minutes, set "Every" to 5 and "Period" to minutes.

Crontab — Use crontab, instead of Interval, if you want your task to run at specific times. Use the green "+" and fill in the minute, hour, day of week, day of month, and day of year. You can use "*" in any field in place of a specific value, but be careful - if you use "*" in the Minute field, your task will run every minute of the hour(s) selected by the other fields. Examples: to run every morning at 7:30 am, set Minute to "30", Hour to "7", and the remaining fields to "*".

Arguments — If you need to pass arguments to your task, you can open this section and set *args and **kwargs.

Execution Options — Advanced settings that we won't go into here.

Default schedules

If you want some of your tasks to have default schedules, and not have
to rely on someone setting them up in the database after installing
your app, you can use Django fixtures to provide your schedules as
initial data for your app.

If you never want to edit the schedules again, you can copy your json file
to initial_data.json in your fixtures directory. Django will load it
every time syncdb is run, and you'll either get errors or lose your
changes if you've edited the schedules in your database. (You can
still add new schedules, you just don't want to change the ones that
came from your initial data fixture.)

If you just want to use these as the initial schedules, name your file
something else, and load it when setting up a site to use your app:

Hints and Tips

Don't pass model objects to tasks

Since tasks don't run immediately, by the time a task runs and looks at
a model object that was passed to it, the corresponding record in the
database might have changed. If the task then does something to the model
object and saves it, those changes in the database are overwritten by
older data.

It's almost always safer to save the object, pass the record's key, and look
up the object again in the task: