Menu

I had to whip up a solution to a data migration requirement and had no choice but to use SQL Server Integration Services (SSIS). It is marketed as fast and flexible tool for data extraction, no idea about the “fast”, it’s user interface and error/warning messages make using it far from flexible. A lot of the time I found myself in a battle to achieve the simplest task and not being supported by the tool. I admit that this is because I have no prior experience with any of the data control objects. What made matters was the interface wasn’t very helpful in the names of controls or descriptions of tool-tips. Note this is my experience with Visual Studio 2005 and SSIS, it may have improved in VS 2008 or the upcoming VS 2010.

I had 2 objectives to achieve: join data from 2 tables, and the use of the last generated ID for a subsequent query. It appears the latter was not even considered in the design of the tool. You would think that a data writing control would have more outputs than just exceptions.

Having “successfully” met the basic requirements of “migrating data” I thought I’d share the approach I took, it may not be the optimal approach, but it works, and in this scenario performance isn’t a concern.

The data being merged from one location (example: a legacy system) to a new system with a different data representation model. I’ve put in the context of the “Medical System” which is the theme of my posts. In this post I introduce a concept of related patients. Simply put a patient can be related to another patient in the system, examples of relationship types are ‘single’, ‘couple’, ‘sibling’, etc. There are other representation complexities here, but are not relevant to the post or SSIS discussion. The ER model is as follows:

Basic structure Many-Many table structure

As a requirement at the point of the data merge, every patient must be created with a default ‘single’ relationship entry. This is where SSIS doesn’t support it easily. Based on a requirement of maintaining existing patient ID’s as part of the merge and identity insert is performed by SSIS into the patient table. Then a new ‘single’ relationship type record must be created in the relationship table. Next the non-SSIS-supported task to create a new entry in the linking table (PatientRelationship) using the newly created ID of the single relationship record. This leads to the need for the use of the inbuilt database function SCOPE_IDENTITY() or it’s alternatives such as @@IDENTITY. I could not find a supported approach in SSIS to obtain this value via the output without the use of a stored procedure.

At this point all the material I found online was to make use of a SQL Stored Procedure with an OUTPUT parameter to obtain the value directly from an insert statement. This is fine if you need to make use of it back in SSIS. But in this case all that was required was a follow up insert statement. So I embedded the initial insert and the subsequent statement in 1 stored procedure, taking the PatientID of the record currently being processed by the SSIS package as the input:

As a quick side note – I asked a question on Stack Overflow about mapping hard coded values inside SSIS the answer was to use a “Derived Columns” column, here is the stack overflow question and answer that has the tips for data formatting. Another option was to create default values on the database schema that housed the source data for the migration.

Once the stored procedure was created making use of it in SSIS required another “Data Flow Task” and inside that task using an “OLE DB Command” to call the procedure via

EXEC dbo.SetupRelationships ?

the question mark represents a parameter, if you had a procedure taking 3 input parameters and 1 output parameters it would look like this:

EXEC dbo.AnotherProc ?, ?, ?, ? OUTPUT

The SSIS “Data Flow Task” now looks like this:

SSIS Data Flow Task

With the Advanced Editor properties dialog looking like this (click on image to see the full sized screen shot):

OLEDB command setup

The final step is now to create the column mapping to supply the Patient ID into the stored procedure on Column Mapping tab (again click for larger image):

OLEDB Column Mapping

That was it, “Execute Package” and the data would migrate meeting our requirements.

And SSIS still sucks in 2012. Simple tasks become a big deal, and it shouldn’t in an ETL-tool.
Example: trimming some characters from at string otner then space? The TRIM function only supports space (like the the SQL Server SQL-trim, by the way). Solution: Create a .net-script in the dataflow – OMG! (Informatica is still ETL-king!)