In an ideal scenario, before you do any SSIS development, your team has an experienced SSIS consultant/trainer come in. They provide you with a framework that works for your environment and establishes a set of best practices that your team sticks to for all future package development, until the end of time.

This rarely happens.

It’s Too Easy

SSIS has a learning curve, but is fairly easy to get started with. You can throw together a simple workflow & ETL running on your local machine in a short amount of time. It starts to get a bit more complicated when you need to deploy that solution to another machine, and even more so when you start adding multiple developers to the mix. This is where the best practices start to be required.

The downside to SSIS being easy enough for people to learn on their own is many people come up with their own way of doing things. Given that there are usually multiple ways to accomplish common ETL tasks, your company can end up with a mix of “best practices”, “frameworks” and ways of using SSIS.

Did you know there are at least 5 different SSIS frameworks used internally at Microsoft? Each one has its own take on how to do configuration, dynamic package execution/generation, and logging. I wouldn’t be surprised if there are a number of others that I’m not aware of.

We improved this in SQL 2012 with the addition of the SSIS Catalog (SSISDB). In 2005/2008, the biggest differences between SSIS projects would typically be the way they do deployment & configuration. The Catalog was meant to provide a “default” best practice, and remove the need to rely on their own custom frameworks. There’s still more we could do here, but given the adoption rates we’re seeing, it was definitely a step in the right direction.

So the question becomes, how do I make sure that the SSIS developers in my company are following best practices and using the framework we have in place?

Training plays a big role, here. If you show your developers that there is a better way of doing things, they should be willing to adopt it. That said, many times you’ll find consistency is more important than doing things in a slightly better way. I know this sounds ironic coming from someone who worked on the product (*COUGH*three separate data type systems*COUGH*), but I give this advice to other people, speaking from experience.

Tools to Help Apply Best Practices

Tools can really help here, as well. There are two third party tools that comes to mind:

If you’re looking to apply design patterns and/or you require dynamic package generation, then Biml from Varigence is a great solution. It’s a powerful markup language/compiler that makes it really easy to create your SSIS packages in a consistent way. There is a free version of Biml integrated into BIDS Helper, and a more advanced version (and separate designer – Mist) available from Varigence. There is also a Biml repository site that is great for examples.

The slides from my 3rd presentation at the TechDays Hong Kong event are now available. As I mentioned during the talk, design patterns are one of my favorite topics, and something I really enjoyed writing about. This presentation had a subset of the SSIS Performance Design patterns in the full day training sessions I’ve done in the past. See this post for a link to the full deck (over 100 slides).

Clicking the Add button on the Parameter bindings page will automatically populate the binding list with the next un-bound parameter for the child package you have selected to run. Note, however, that you can manually enter a parameter name in the Child package parameter field, even if it doesn’t actually exist on the child package.

At runtime, any parameters that don’t exist on the child package don’t cause any errors or warnings at runtime. There is a reason for this…

Dynamic Package Execution

You can change the which child package the Execute Package Task runs using an expression on the PackageName property.

Since not all packages will have the same set of parameters, the Execute Package Task allows you to define a superset of parameter bindings at design time. Parameters that don’t exist for the current child package will simply be ignored.

For example:

ChildPackage1 requires ParameterA and ParameterB

ChildPackage2 requires ParameterC

ParentPackage has logic that will dynamically call either ChildPackage1 or ChildPackage2 at runtime

To do this, you would add parameter bindings for all three parameters (ParameterA, ParameterB, ParameterC) in the Execute Package Task in your parent package.

Here are the slides (all 177 of them) for the SSIS Performance Design Patterns talk that I’ve delivered at three different pre-conference events over the past month. Each presentation was slightly different, but the core content remained the same. These events included:

I’ll also be covering SSIS Performance Design Patterns in a full day training session the day before the event. The content will be similar to the pre-con session I’m doing the following week at SQL Bits X, with a couple of changes – I’ll be presenting solo, and will spend the last portion of the day covering the enhancements we made to SSIS in SQL Server 2012.

SSIS has built-in support for transactions, but their use isn’t ideal for all scenarios. This post covers how SSIS transactions work at a high level, some of the challenges associated with them, and provides some design alternatives.

Disclaimer: Many SSIS packages use transactions without issue. Using the built-in functionality is the easiest approach, and (typically) doesn’t require a redesign of your package logic to implement. If your packages are currently using transactions, please do not feel the need to go in and change anything! If, however, you are looking to improve the performance of your packages, or are encountering the issues I describe below, consider using the design patterns presented here.

How SSIS Transactions Work

SSIS transactions are based on the Microsoft Distributed Transaction Coordinator (MSDTC). They are enabled at the Task/Container level by setting the TransactionOption property to Required. For a transaction to work at runtime, MSDTC must be enabled (and configured) on all machines involved in the transaction (i.e. the source machine, the destination, and the machine running the SSIS package… and of course, in some environments, this might all be the same machine).

Challenges

Configuration

Configuring DTC to communicate across hosts and machine boundaries can be tricky. Timeouts and unexpected termination of connections on long running queries are another common source of problems. Finally, the server(s) you are accessing must support DTC – this isn’t always possible when your database is using a non-Windows platform.

Performance

Performance is probably the biggest challenge that customers face with SSIS transactions. Using any sort of transaction will impact the performance of your packages, but DTC transactions can be especially “heavy-weight”, as they can require coordination between multiple hosts.

DB Provider

To enlist in a transaction, the underlying data provider used by your connection manager must also support DTC transactions. Microsoft providers (like SQL Native Client and SqlClient) typically support DTC, but the same may not be true for third party providers.

Transaction Support

My biggest complaint (and perhaps it’s something we can address in the future) is that there is no way to tell which Tasks actually support transactions. Take the following control flow:

We’re missing some information from the UI:

We can’t tell if transactions are enabled

We can’t tell where the transaction starts

We can’t tell if any of the tasks have opted out of the transaction (TransactionOption = NotSupported)

To get the information we need, we have to rely on good documentation (i.e. annotations), or take a look at the TransactionOption property for each of the tasks/containers in the package.

Even if we can assume that transactions are enabled at the package level, and none of the tasks have opted out, not every Task is able to rollback their work when a transaction fails. Using the example above:

Supports Transactions (rollback)

Execute SQL Task

Analysis Services Execute DDL Task

Data Flow Task

Does Not Support Transactions (rollback)

XML Task

File System Task

The Script Task may or may not support transactions, depending on how it was written.

Design Alternatives

Manual Transactions

A common alternative to using the built-in transactions is to create them yourself. This is done by setting the Connection Manager’s RetainSameConnection property to True, and creating/committing/rolling back the transaction using Execute SQL Tasks. Jamie Thomsonblogged about this back in 2005, and is a very common design pattern amongst SSIS developers today. When RetainSameConnection is set to True, the connection manager will attempt to reuse an existing connection when a task asks for one (instead of creating a new connection each time).

The advantage here is that your transactions become “light-weight” and perform better, as they no longer need to be coordinated across multiple machines using DTC. However, this approach is not a silver bullet, and there are some limitations to what RetainSameConnection can do.

The behavior of RetainSameConnection is determined by the underlying provider – there is very little logic in the Connection Manager itself to handle connection reuse. Not all providers will be able to support this functionality (although it should work fine with the SQL Server providers). Also note that not all Connection Managers support RetainSameConnection, either. File Connection Managers, for example, simply return a string to the file or directory they are pointing to – they don’t actually open up a “connection object”.

RetainSameConnection might not work if the a connection manager is being used multiple times in parallel. You usually encounter this issue if you are trying to SELECT and INSERT at the same time. Some common examples:

Running multiple Execute SQL Tasks in parallel using the same connection manager

Solution: Use separate connection managers, or run the tasks serially

Both the Source and Destination component in your Data Flow are using the same Connection Manager

Solution: Use a separate connection manager for the source component

The Destination and another transform (usually Lookup, or Script) are using the same Connection Manager

Solution: Use a Full Cache Lookup or staging tables

The last thing to note is that if your data flow is writing to multiple destinations, you will need to use multiple connection managers. Be sure to start a separate transaction for each one.

Note: If you are using this pattern and encountering issues, try using SQL Profiler to detect if your queries are in fact using the same SPID.

Design for Restartability

Instead of using transactions, consider designing your packages for restartability. This doesn’t necessarily mean the use of Checkpoints (although you can use them if you are so inclined) – it means that your packages take any corrective actions they need to ensure that they can be safely run multiple times. The type of actions you take will depend on your package’s logic. For example, a package that loads multiple flat files from a directory could move/rename a file once it has been successfully processed to avoid loading it a second time if the package needs to be restarted. This approach requires you to put more effort into the design of your package, but will give you a much more robust ETL solution.

There will be scenarios where the use of transactions is required, but it is usually possible to limit their scope. This typically involves splitting up your ETL logic across multiple tasks (or packages), staging your data in between each step, and enabling transactions only when absolute necessary. For example, you may create a Data Flow that pulls data from a source, performs the required transformations, and then loads the data into a staging table. You can then use another Data Flow (or an Execute SQL Task if the staging table is area in the destination database) to insert/merge the data into the final destination. With this approach, you will only need to use a transaction for the final insertion.

Handling Failed Inserts when using Fast Load

A useful pattern when designing for restartability is to isolate the specific rows that fail to insert into your destination. If you are using Bulk / Fast Load at your destination (which is recommended if you are at all concerned with performance) and one of the rows fails to insert, it may fail the entire batch (the logic depends on the provider and destination you are using). To isolate the failed row, you can add a second destination to the error output that does row by row insertions (i.e. does not use Bulk / Fast Load). You can then redirect the error output of the second destination to a flat file / staging table so it can be manually inspected and corrected.

Summary

SSIS has built-in support for transactions, but to get the best performance, you might want to consider an alternative design. I highly recommend designing your packages for restartability, and limiting the use and scope of transactions in your packages.

I hope you found this blog post useful. I’d love to hear about specific scenarios where you are using transactions, and alternative package designs you might have used to implement your own form of restartability.

SSIS Design Patterns is a book of recipes for SQL Server Integration Services (SSIS). Design patterns in the book show how to solve common problems encountered when developing data integration solutions. Because you do not have to build the code from scratch each time, using design patterns improves your efficiency as an SSIS developer. In SSIS Design Patterns, we take you through several of these snippets in detail, providing the technical details of the resolution. SSIS Design Patterns does not focus on the problems to be solved; instead, the book delves into why particular problems should be solved in certain ways. You’ll learn more about SSIS as a result, and you’ll learn by practical example. Where appropriate, SSIS Design Patterns provides examples of alternative patterns and discusses when and where they should be used. Highlights of the book include sections on ETL Instrumentation, SSIS Frameworks, and Dependency Services.

Takes you through solutions to several common data integration challenges

Who this book is for

SSIS Design Patterns is for the data integration developer who is ready to take their SQL Server Integration Services (SSIS) skills to a more efficient level. It’s for the developer interested in locating a previously-tested solution quickly. SSIS Design Patterns is a great book for ETL (extract, transform, and load) specialists and those seeking practical uses for new features in SQL Server 2012 Integration Services. It’s an excellent choice for business intelligence and data warehouse developers.