October 30, 2006

It’s come to that point where all of the SSIS packages for the data warehouse are complete. As other searches have pointed out, there are multiple ways to glue everything together. Coming from a DataStage or Informatica background, however, I prefer the creation of a “master package”. This package is strictly a control flow that utilizes several ‘Execute Package’ tasks.

Step 1: Define File Connection

For this implementation, all of our packages will exist as files. This requires that each package have a defined file connection within the master package. Under the Connection Manger, right click to add a new file connection for the following dialog prompt.

Click on Browse to add the package of choice in the following dialog.

After selecting the file, click on the File Name listbox. This will automatically highlight the filename. Copy the filename to the clipboard by pressing Ctrl-C. This will save time in the later steps.

Once the file connection is added, an expression will need to be defined for the ConnectionString property. In my installation, I defined a variable User::ExecDir_DWL that corresponds to the project folder location for all of my packages. This allows me to define the ConnectionString as follows (hint: use Ctrl-V to paste in the package name to save time and reduce errors). Once complete, this will ensure that the file connection is not tied to any specific location. In turn, this allows for the master package to be easily deployed to any folder structure.

Step 2: Add Execute Package Task

The final piece of the pie is to add the Execute Package task to the control flow. Here, I use the file name text copied to the clipboard earlier to save time.

First, rename the execute package task to ‘EXEC <package_name>‘.

Second, explicitly set the property PackageName using the expression builder. I’ve noticed that the package name isn’t set when using a package template. This should override that issue.

October 16, 2006

Straight from the MSDN archives, I found this little gem to count rows within a data pipe as they are processed. The key phrase is: “as they are processed”. You’d think the Row Count transformation would work if you wanted a sequential row number. Unfortunately, this transformation only updates the assigned variable at the end of processing for the given Data Flow task.

October 14, 2006

Crazy, but I must admit the convenience. This came up after loading a table. I needed to generate an incremented value in the SQL Server database. Normally, you could do this by setting the column to be of type IDENTITY. I didn’t have the luxury and didn’t want to reload it anyway.

October 3, 2006

Through the magic of the system tables in SQL Server: sysobjects, systables, and syscolumns, I’ve found a new favorite query.

Scenario

ETL packages must account for trimming extra pads placed on data fields from the source. This is usually a problem from data sourced from flat files, especially fixed-width files. Some may argue that not addressing these issues results in data quality issues. I just prefer that things are perfect for future users. (and yes, data quality is on my mind too)

Solution

The following query will dynamically generate a query for every column in a table. The each query will confirm the max length of a field versus the trimmed max length. This results in a huge time savings from manually typing everything. Enjoy!

** Note: The query above refers to ‘individual’ as a specific table name. You’ll want to change that based on your validation.

Update:The SQL Server function DATALENGTH() should be used instead of LEN(). LEN() will automatically rtrim a column, thus giving you a false positive. This behavior may be limited to VARCHAR columns, but be warned.