SQL Server Integration Services Package Restartability

Problem

I've got a long-running SSIS package that occasionally fails. However, the package always starts from the beginning after a failure, even if several of the tasks completed successfully on the failed execution. I've tried using
SSIS checkpoints, but have found them to be difficult to configure and, on occasion, unpredictable. Is there a better way to build in restartability in my SSIS packages?
Check out this tip to learn more.

Solution

The default behavior for
SQL Server Integration Services packages is to start the execution from the first task(s) in the data flow, irrespective of the success or failure of the last operation. In most cases this is the intended operation - when practical, it's easier and simpler to build packages with a "start from the top" design pattern even if the last package execution failed. However, there are many cases where it makes more sense to resume execution at the task that failed, if the previous execution was not successful. Consider for a moment a case where a package is executed multiple times per day, typically running for 4 hours before a successful completion. What happens if a task fails late in the load process after, say, 3 hours of execution? The default behavior would force the package, upon the next execution, to start from the beginning, re-running every task including those that had previously completed successfully. Even if the ETL developer (or other person designated to address package failures) is able to immediately find and remedy the issue that caused the failure, the package must now start again from the beginning. The 4 hour ETL cycle has now nearly doubled, and any business processes waiting for the refresh of the data will be working on stale information. The mind can easily wander to more extreme scenarios where the ETL runs for the better part of a day, meaning that a failed execution will force today's ETL into tomorrow's schedule, and so forth.

In cases such as this, it makes more sense to consider a design pattern that allows packages to restart from the point of failure rather than re-running successful operations. In this tip, I'll provide a brief demonstration of how to build this smart restart logic into your packages.

Author's note: As mentioned earlier, Integration Services does have a mechanism intended to be used for package restartability.
SSIS checkpoints, a file-based solution for tracking package execution status to allow package restartability at the point of failure, will - at least in theory - allow the SSIS developer to accomplish the same task described in this tip. However, after working with
SSIS checkpoints on a number of occasions, I've found them to be difficult and unintuitive to configure, and even when configured properly, have a well-documented tendency to not work as intended. Ordinarily I wouldn't recommend writing a custom solution to perform the same function as native SSIS elements, but in this case the native solution introduces an unacceptable level of difficulty and uncertainty.

To address the need for package restartability, we're going to implement the following high-level strategy:

Create a database to store package restartability metadata

Create and populate a table to store the names of the tasks to be included in the restartability scheme

Create a table to store the restartability marker

Build our package(s) to check this table for restart markers. If such a marker exists for the given package, start at that point rather than at the beginning of the package.

Build our package(s) to write to the restart marker table in the event of a failure

Create the metadata

First, create your metadata database. Note that if you already have a user database in your environment for this purpose, you can just create these objects in that database. In my environment, I'm going to call the new database PackageController.

CREATE DATABASE PackageController
GO

Next up, we need to create a table that will store the names of the tasks that are allowed to participate in restartability operations. Specifically, any task that may be used as a restart point should be included in this table.

As shown below, the table to store the name of the packages and tasks to be controlled is relatively simple - in this example, I use a name-based column to track the packages and tasks to be included. For our sample operation, there are only two possible restart points - one for each task which can be used as a start point in the event of a failure - and we need to insert a row into this new PackageTask table for each possible restart point.

Note that I'm not storing any information as to the reason for the failure in these tables. While I do strongly recommend that you log such information in some form, I've purposefully kept the scope of these objects limited to only the restartability functionality for the sake of simplicity. It is assumed in this sample that package logging (either native or custom) is taking place peripheral to what is described here.

Apply the Restart Pattern in the SSIS Package

With the restartability metadata in place, we can now start using this pattern on new or existing packages. For demonstration purposes, I've created a trivial package with three tasks performing the following functions:

Truncate staging tables

Load a large flat file to a staging table

Load a smaller flat file to a staging table

In the package, the first thing I need to do is to query the PackageRestartPoint table to see if a restart marker exists for a previous failed execution. In this design pattern, this table will have exactly zero or one rows of data; the presence of a row of data indicates that the last package execution failed. As shown below, I'm using an instance of the Execute SQL Task to retrieve the name of the restart marker for this package, if one exists.

In that query, I've mapped a couple of parameters to SSIS variables. The SSIS system parameter [PackageName] is passed in as an input value in the WHERE clause, and the user variable [StartTaskName] is mapped as an output value to store the name of the task to be used as the starting point. It's worth noting that, if we have multiple packages to which this design pattern will be applied, we can use the same metadata table as we're identifying both the package name and the task name.

Next up, I'm going to add expressions to my precedence constraints to control which path should be taken. As shown below, each of the precedence constraint leading away from the metadata SQL task have an expression applied.

For each of these precedence constraints, I'm using an expression similar to the following to compare the variable containing the start point task with the name of the task to which the working end of the constraint is attached. Each expression is configured with the name of the next connected task, and will only start with that task in the event that it is the task identified as the failure point on the last execution.

@[User::StartTaskName] == "DFT Load consumer list"

For the task at the beginning of the chain (the one that should be executed if no restart marker exists), I only need to check to see if the [StartTaskName] variable is empty. I didn't add the first task to the list of possible restart points, since this would be the natural starting point anyway. Therefore, a successful execution on the last run OR a failure of the first task would not cause the insertion of a restart marker, so I don't need to check the name of this first task in the expression - I only need to confirm that there is no task name in that variable.

LEN(@[User::StartTaskName]) == 0

Finally, I would add a task to write out a new restart marker (using a package-level OnError event handler) to capture the name of the task in error in the event of a package failure. As shown below, I'll use the name of the failed task as a parameter to the INSERT query in the event handler, and I'll combine it with a SELECT from the list of allowable restart points for this package to ensure that we only use predefined restart points.

The net result of this configuration is that package execution will start at the first task inline if no restart marker exists, or if there is such a marker indicating a previous execution failure, execution would begin at the point where the previous execution failed.

This pattern allows me a great deal of flexibility, as I can specify specific tasks that are to be engaged in the restartability scheme. This can be useful for packages that require restartability after failure, but also have a "point of no return" after which the entire package should be re-executed in the event of a failure.

Next Steps

In this brief tip, I've demonstrated how to bypass the sometimes unpredictable behavior of SSIS checkpoints by creating your own task-level package restartability mechanism. As for next steps, I recommend the following:

Create the restartability structures (database and tables) on an existing dev or test machine.

Apply the restartability design pattern shown here to the package(s) that require restartability, and modify the restart points as necessary.

In addition, if you have a large number of packages that require this pattern, I'd recommend looking into implementing an ETL Framework structure, many examples of which can be found online (and some are completely free). SSIS ETL framework structures often have package-level restartability built right in.

Last Updated: 2013-05-28

About the author

Tim Mitchell is a business intelligence architect, author, speaker, and SQL Server MVP. He specializes in SSIS.

Post a comment or let the author know this tip helped.

All comments are reviewed, so stay on subject or we may delete your comment. Note: your email address is not published. Required fields are marked with an asterisk (*).

*Name
*Email
Email me updates

Signup for our newsletter
I agree by submitting my data to receive communications, account updates and/or special offers about SQL Server from MSSQLTips and/or its Sponsors. I have read the privacy statement and understand I may unsubscribe at any time.

For my purposes, I've added another layer that I call the application layer. This allows me to differentiate between the packages and tasks associated with one set of packages from another set of packages, e.g. DW load versus Data Mart loads. Consequently, I can have more than one entry in the PackageRestartPoint table in certain instances. Might seem like overkill but I'm trying to build a framework that can be applied regardless of the size of the enterprise where job streams can be running around the clock.

See also the MS White Paper SSIS Operational and Tuning Guide as they show a nice way of designing restart/retry logic without losing pipeline progess within a data flow task. It is covered in the second half of the paper; whereas, the first half of the paper deals with the driver of this design--potential latency/connectivity issues when using SSIS in the cloud.

I agree by submitting my data to receive communications, account updates and/or special offers about SQL Server from MSSQLTips and/or its Sponsors. I have read the privacy statement and understand I may unsubscribe at any time.