Retrying celery failed tasks

I assume you have a basic understanding of celery. If you want to learn about basics of celery, you can check our last blog.

Use case

In one of my projects, I work with Twitter api. I need to fetch a user’s tweets. Twitter provides an api endpoint for fetching user’s tweets. Fetching of tweets involve network calls and so should happen in background, so we fetch the tweets using a celery task. So I have a celery task which makes one api call to Twitter. If I am able to fetch the tweets I consider the celery task was successful.

But Twitter rate limits this api endpoint and I can’t hit this api endpoint indefinite number of times. Twitter allows maximum of 180 calls to this endpoint in a 15 minute window. Any call beyond 180 calls will start failing and Twitter will raise an exception instead of returning tweets. If Twitter raises exception instead of returning tweets, I consider that the celery task has failed.

Assuming a very active Twitter user signs up and we need 200 api calls to fetch all his tweets. So my celery tasks run 200 times. First 180 tasks would succeed but next 20 tasks would fail because of rate limiting.

I do not want to miss any tweet by a user, and so any failed task must be retried after 15 minutes. This is where celery retry functionality comes into picture.

Pseudocode for my usecase

@app.task(bind=True)
def fetch_tweets(token_details):
# token_details is user specific User token that needs to be passed to Twitter
try:
resp = make_twitter_call()
# Till 180 calls to Twitter, we will get `resp`
# process result
except TwitterException:
# Till 180th call, this part will not be executed
# 181th call onwards, Twitter will raise an error and this part of code will be executed
# Retry fetch_tweets after 15 mins.

Before fixing this use case, let’s play around with some basic examples.

Basic example

add.py

Ensure task defined in this file is running properly by running the worker and trying to run the task from ipython shell.

Worker terminal

celery worker -A add -l info

Ipython

from add import add
add.delay(3, 4)

After you do this, addition should happen and “7”, i.e 3+4, should be printed on worker terminal.

Explicity raising error

Setting up Twitter access tokens etc will take effort and time, so instead of setting up Twitter, let’s try to replicate a similar scenario where some calls to our task succeed and some calls to task fail.

Let’s first replicate a failed task. A failed task means an error happening in the task. Raise an error in your task, and see how it behaves.

Explanation

In our task, we generate a random number between 1 and 10. If the number is even then the task succeeds and returns the sum of numbers. If the generated random number is odd then the task fails, in which case we retry the task after 2 seconds.

Generated random number was 7. So task raised an exception and so it was retried in 2 secs. In the retried run, generated random number was 8 which is even and so no exception was raised and instead the sum was calculated and returned.

Generated random number was 7. So task raised an exception and so it was retried in 2 secs. In the retried run, generated random number was 3 which is again odd, so retried task also raised an exception. And since we have set max_retries as 1, no furter retry was done.

Setting a datetime instead of countdown

Till now we have been working with countdown which tells the number of seconds in which failed task should be retried. We can also set a datetime at which task should be retried.

Let’s retry the failed task after 2 seconds but using datetime instead of countdown.