Pages

January 10, 2012

SSIS 2012: A Peek to Data Taps

Note: The example in post is based on SSIS 2012 RCO
Data taps are one of the new features in SSIS 2012 in the data and performance troubleshooting

category. In a nutshell, they allow us, at runtime, to choose a given path from a data flow and capture a copy of the data at that specific point of the data flow in a .csv file on a given execution instance of the package.

Let's break it down some more:Run-time. Data taps are a runtime artifact, so as developer, you don't define them as you design and develop packages in SSDT. Rather, you do it after the packages are deployed to the server by way of running some stored procedures in SSMS.Data Flow Path: Data taps are defined at the data flow level, and within a data flow, at the path level. Paths are the blue or red arrows in the data flow that connect inputs and outputs of data flow components. You can read the books online definition of a path here. Execution Instance: A data tap captures data in a given data flow path and stores it in a .csv file. This happens within a single execution instance of the package. That means we need to add the data tap each time we run the package during our troubleshooting exercise.

You could think of data taps in a similar way you think about data viewers in BIDS SSDT, except you don’t have to edit the package to add them.

Setting up a Data Tap

Now, let’s see an example and some pictures to show how you can add a data tap to a data flow.
First, we create a package with a data flow in it and deploy it to the SSIS server. For this example, I have a pretty straight forward data flow that gets a list of 4 products via OLE DB Source, does some string manipulation and then loads the data into an OLE DB destination. As you can see in the picture, this data flow has 2 paths (blue arrows linking data flow components). Note the value of IdenificationString property of the data flow path as we will use it later.
.

We will need some pieces of information about the package we want to tap data from before we can setup the data tap:

Folder Name: The folder within SSIS server where the package is deployed. Project Name: The name of the SSIS project that holds the packagePackage Name: well, just the package name. Data Flow Path ID: The value of the IdentificationString property of the data flow path where we want to add the data tap (highlighted in yellow in the picture above).
Data flow task path: The path of the data flow task within the package. As opposed to getting the Dataflow path ID, getting the task path is tricky as it is not readily displayed in SSDT, an you typically have to resort to some sort of trick to get it. I will share a trick you can use to get value of the property path in another post. In this example, let’s assume we do know that the path to the data flow is\Package\Load Product Update: The path to the data flow task is readily available in the packagePath property of the task. Thanks to David Joubert for pointing that out!

Once we have this information, it is time to add the data tap(s) and execute the package. We do all this in SQL Server Management Studio by running 3 stored procedures that are built-in the SSIS catalog:

Create a execution instance for the package by running [catalog].[create_execution] stored procedure

If everything goes well, you will have the .csv files in the :\Program Files\Microsoft SQL Server\110\DTS\DataDumps folder of the machine you are running the stored procedures. You can always use the built-in package execution reports to validate the execution of the package was successful or to look for error messages generated during the execution of the package.

Folder with data tap files

Is That it?

Almost. The example in the post is quite simplistic and the bare minimum I could come up with to get a data tap working. There is additional information in books online about them that you may want to review:

Conclusion

Troubleshooting data issues can get hairy at times and data taps are an extra tool in our belt that can help in cases where logging (which has also improve a lot in SSIS 2012) and dumps may not give us the required level of information. In the other hand, I would have appreciated a more seamless user experience, specially when getting the parameters needed by the stored procedures, and perhaps having more control over the file format and its content. As it stands, we may always need to open a copy of the package in SSDT to get the required metadata and the jump back to SSMS to complete the work. It would be great to have a point and click interface within SSMS that allows to navigate the structure of the package/dataflow and let us add a data fow with few clicks. I would like to see an interface like the 'package explorer' in BIDS for this, but unfortunately that suggestion did not get too much traction last time around and it was closed as 'won't fix'. may be in SSIS 2014?

6 comments:

Sorry for my poor english but i'm french and i'm angry with english language.I just want to say you that i think the Data flow task path can be easily found in the data flow properties on the line "PackagePath".