Continuous Deployment – Long Running Batch Tasks

Continuous Deployment is an incredibly stable and versatile method of designing software. The beauty of this system is that any task, at any time, could be interrupted by an update. Yet despite these interruptions the systems maintain constant integrity and stability.

This is because Continuous Deployment software has been designed from the ground up to manage interruptions automatically. Building your software using Continuous Deployment methods will insure your software is robust. Thus whether planned interruptions take place, or more unexpected system failures (i.e. hard drive problems) the software will maintain constant integrity and stability.

A key part in Continuous Deployment is the appropriate design of long running batch tasks. Long running batch tasks refers to an algorithm that can take anything from minutes to hours to complete (compared to Short Running Tasks that take seconds).

Processing Long Running Batch Tasks.

Long Running Batch tasks’ have three main attributes:

Each task stops quickly on request.The quicker the better so the system can finish the update. This should be ideally in under 10 seconds. Longer running tasks will need to be terminated to allow the deployment to continue.

Each task’s output is a single database transaction. This means if the processing is stopped the database is left in a consistent manner.

If you are calling third party services, then the results from the services should be logged, so if a task needs to restart it can check the status of the third party services and avoid calling them 2 or more times. e.g. you only want to process the credit card once!!

For example if you have to process 500,000 price updates, the first task would be to batch these updates into smaller chunks that can be processed quickly e.g In under 10 seconds. I have often found the best batch size to be 1. With larger batch sizes if the processing is stopped, you need to work out where you left off.

The basic processing loop

All the batch processing steps – from Initial Setup to Processing to Clean up – all need to follow the below basic processing loop.

Since your code can restart at any time, when the code is restarted it needs to be able to do three things

Work out where it left off

Do any clean up

Start the next processing step

The important part is the ability for the code to work out (1) where to restart the process and (2) whether any clean up needs to be done.

The clean up stage refers to removing any partial processing from the system, so when the process resumes no double entries occur. However this isn’t so much of a problem if all the work is on a database. This can be covered by a single database transaction, because f the processing is stopped all the changes will be automatically rolled back.

When you start using queues and files for storage then some thought needs to occur on how to pause and resume processing with your code. This effort will result in your code been super robust.

Batch Processing Example

Lets look at an example. You have a customer who FTP’s a file to you nightly for processing. The following is a overview of the process. The first two steps are Batch Creation and Batch Item Processing.

Batch Creation

Below is a simple process for initializing a batch, to prepare if for processing.

This is the processing loop to prepare a Batch for processing. This process can be paused at any time and when restarting the system will pick up where it left off.

Since the system only deletes the FTP file once all the batch steps have been created in the database, we can make sure everything is ready to be processed once the file as been deleted. Then the system can move on to processing the batch items.

Batch Item Processing

Below is a simple process for processing a single item in the batch.

If you are using database transactions then you will probably never need to worry about clean and rollback steps. Since if the processing was interrupted, then the database would have taken care of that for you automatically.

If you follow these methods you will create robust software that can be used with continuous deployment.