November 29, 2006

When dealing with source database systems, it’s common to only want to process the “delta” of changed or new records. These systems typically have a created and last_updated datestamp on each record for audit purposes. This entry will detail how one can do this in SSIS.

Last Run Date

In my scenario, I opted to store a transaction log table. This table stores the package execution history. This table is then queried in a SQL Task in the control flow. The SQL Task is set to retrieve only one row and store the last_load_date into a variable. This variable will then be used later in the where clause of my source SQL in later data flows.

The setup of the SQL Task to return the last_load_date into a variable is done as follows:

Defined a variable LastRunDate with a data type of DateTime.

Under the General tab, set the Result Set to ‘Single Row’. The SQLStatement is set to:select last_load_date from load_history
where mapping_name = <mapping_name>
and end_date is null
and status = ‘Pending’

Under the Result Set tab, set the Result Name: ‘last_load_date’ to the defined variable. In my case, I named my variable: LastRunDate.

The Hidden Magic of Variable Properties

Next, is to define a variable for the source sql.

Add a variable of data type string. In my case, I named the variable: SQL

With the variable selected in the variable window, hit F4 to open the variable’s properties

Set the property ‘EvaluateAsExpression’ to true

Set the property expression using the variable defined earlier. Don’t forget that the expression is a string, so it must be wrapped in double quotes. For example:“select col_1, col_2, … from table_1
where
last_update_date >= ‘” + @[User::LastRunDate] + “‘
or created_date >= ‘” + @[User::LastRunDate] + “‘;”

November 14, 2006

I thought it would be interesting to compare and contrast the differences between SSIS and IBM DataStage. It is easy for some to be married to one particular technology, so I thought that this would be interesting for some.

In this example, we’re in the situation of working with source files. After a load, I believe it’s best practice to archive the files that you’ve loaded for later reference. These files can then be absorbed by the corporate archive strategy all ready in place. The goal of the process is as follows:

Open File

Process data in file and load to target

Archive File

SSIS File Archive

In SSIS, steps 1 and 2 are handled by the Data Flow of the package. Step 3 is executed within the Control Flow. There are two methods available for archiving:

Utilize two File System tasks in sequential order. The first to move the file to the desired directory. The second, rename the file with the appropriate date stamp. The downside, each File System task requires that a destination and source connection manager be defined.

Utilize a Script Task that executes code to move and rename a file. This has lots more control and can do the move and rename in one step. The downside, code reuse will require one to either create their own transformer or copy the script into to each package.

Using this script as a reference, I went the scripting route as it seemed cleaner. The script was defined with the following variables:

After the Script Task is defined, the package can now move and rename a source data file to the designated Archive subdirectory. This is great if you like to code. My personal opinion, it’s too much logic for a simple ‘move’ command.

DataStage File Archive

In DataStage, the process of archiving files can be controlled through the job’s properties. (Note: a DataStage job is the equivalent to an SSIS package). One only needs to go to the job’s properties and set the ‘After-job Subroutine’.

In the above screenshot, you’ll see that the ‘After-job Subroutine’ is set to ‘ExecDos’, which executes an easy DOS move command. It doesn’t get any easier than DOS. The InputValue is set to the actual command syntax. For clarification, the command is using variables, which are as follows:

#$DrvPath# – drive letter, eg. c:\

#$SrcPath# – source data directory, eg. data\sources\

The command is also using %date to retrieve parts of the date in yyyymmdd format.

Conclusion

If you love to code and you revel in the idea of working with VBA, then ofcourse SSIS is your preference. But what this post intends to point out is that one shouldn’t have to write code for what is essentially one line at the DOS command line.

November 1, 2006

A common data source for my packages includes flat files. Unlike the scheduling of the master package, however, the flat files may not be staged every single day. Leaving the packages responsible for loading the files as is, however, will result in a false failure. This leads to the following script task in the control flow. The task will read input variables, which the file connection manager should also be using, to test whether the file exists or not. If it does not, it will not execute the subsequent data flow and return a success.

Defined variables:ExecDir
FileExists
FileName

Defined Data Flow:

Script Task

In the Control Flow, add a Script Task. The task is defined with ReadOnlyVariables of:User::ExecDir
User::FileName
The ReadWriteVariables is set to:User::ExecDir

Now your set. The script task will read in the variables User::ExecDir and User::FileName. Using those variables, it will use the System.IO to determine if the file exists or not. If it does, the variable User::FileExist variable is set to True. Otherwise, the variable is set to False.

Conditions in Control Flow

The next step is to define a condition in the control flow. This is done by double clicking on the path line from the Script task to the Data Flow task. This leads to the following dialog:

Here is where you add the conditional logic. The Evaluation Operation is set to ‘Expression’. Using our User::FileExists variable, a check is done to see if the value is equal to True. If it is, the data flow executes. Otherwise, we have our graceful exit and the package is successful.