Pump Up the Power of DTS

To leverage the power of Data Transformation Services (DTS), a data-manipulation utility service in SQL Server 7.0, you need to interact with DTS via its native COM interfaces. In "Unleash the Power of DTS" (May 1999), we introduced the DTS object model and illustrated it with a Visual Basic 6.0 application that uses the DTS COM interfaces to perform data transformation. In this article, we continue discussing the object model and show you how to implement the DataPumpTask, the DataDrivenQueryTask, Steps, and Constraints.

Adding a DataPumpTask

After you purge the target database of old data, the next step is to add a DataPumpTask to move and transform the data in the pubs database to the HRMart database. The DataPumpTask defines the information necessary to create and execute an instance of the DTS data pump. As data moves between the source and destination data stores, the data pump applies the Transformations. The DataPumpTask defines the SourceColumns and DestinationColumns, any Transformations on those Columns, and any Lookups. The TaskPubsToHRMart subroutine that Listing 1, page 32, shows creates and initializes a new DataPumpTask and then defines the Transformations to perform.

As with other Tasks, you create the DataPumpTask by calling the Tasks.New method and passing it the class name DTSDataPumpTask; then, you set its properties. The name of the task is PubsToHRMartTask. Next, define the source and destination Connection information. The SourceConnectionId identifies where the data originates. Like the SQLStatement property of ExecuteSQLTask, the DataPumpTask's SourceSQLStatement defines the query that extracts the data. The Destination-ConnectionId property identifies the Connection for the destination. The DestinationObjectName identifies the data's final target location. The example sets the final target to the Employee table in the HRMart database.

The next properties in the DataPumpTask establish limits and controls on its actions. These properties are good candidates for tuning the execution environment. The ProgressRowCount property controls the number of rows between each event notification. Packages and the DataPumpTask can produce and respond to events. (A later article will cover DTS event handling.) The MaximumErrorCount property controls the number of errors allowed before execution terminates. The default value for this property is 0, which causes termination after the first error. You can change this number to meet your requirements according to your Package design and error-handling schemes.

The FetchBufferSize property denotes the number of rows that the data provider will return during each fetch operation. FetchBufferSize is a prime candidate for tuning Package performance; you can adjust this size to balance memory usage and the overhead that calls to the data store incur. The default value for this attribute is 100; however, when the fetch retrieves binary data, the data provider ignores any value greater than 1. The InsertCommitSize property sets the number of rows processed between each commit. If you are familiar with bulk copy program's (bcp's) batch size property, you'll recognize that DTS' InsertCommitSize property behaves in much the same way.

The DataPumpTask's UseFastLoad property controls whether to use SQL Server's IRowSetFastLoad interface. This is the interface used to bulk copy data into SQL Server using a direct OLE DB connection.

When you use it, the FastLoadOptions property identifies which options you want the task to use and the DTS FastLoad-Option's enumeration constants provide the list of values. The DTSFastLoad_CheckConstraints constant, which this example uses, shows how to use the IRowsetFastLoad interface and the default option of Check Constraints. In addition, like bcp, DTS lets you execute the Transact SQL command, set identity_insert. The identity_insert option suspends identity value generation and lets you explicitly insert user-provided values instead. This option can be helpful when you're moving data into tables containing the identity property. To enable this command, set the DataPumpTask's AllowIdentityInserts property to true. The example leaves the option disabled.

The data pump also lets you control which rows the pump processes. The FirstRow property identifies the starting position in the source rowset; the LastRow property determines its ending position. Setting both values equal to 0 copies all rows. The default value for FirstRow is 1; for LastRow, it is 0.

Last, the DataPumpTask lets you write exception rows to a file so that you can move valid data into the destination data store and invalid data to the exception file. If you are familiar with bcp processing, which is an all-or-nothing operation, you can appreciate being able to trap and log individual rows while still moving the bulk of the data into the destination.

After the Task is complete, the trapped rows are logged to the exception file for editing and reprocessing. The ExceptionFileName property establishes the exception file's name and location. The example uses the name PubsToHRMartTask.log. Also, the data pump can control row and column delimiters. The example uses the default values of cr/lf for the ExceptionFileRowDelimiter property and | for the ExceptionFileColumnDelimiter property.

After the DataPumpTask is initialized, add the Transformations. Let's add two Transformations: a ColumnCopyTransformation and an ActiveXScriptTransformation. For performance considerations, minimize the number of Transformations per Task. In a real application, you perform all of this work in one ActiveXScriptTransformation.

Adding a CopyColumnTransformation Add a Transformation to the Transformations collection by calling the New method and passing it the class name of the type of Transformation you are creating. For this Task, create a new DTS.DataPumpTransformCopyTransformation. To initialize it, set its Name property to PubsToHRMartTaskDataPump-TransformCopy; then set the TransformFlags property to DTSPump.DTSTransformationFlag_Default. The TransformFlags property specifies the flags controlling the Transformation. The example uses the default value to specify that the conversion must be exact between types. In addition, you can specify how to handle binary data. The example doesn't use any binary data types; therefore, set these properties to their default values.

Adding a Column After the Transformation, define the Columns to change. The Task can access the rowset from the data source and destination Connections, but a Transformation can access only those Columns you assign to it; so, add the SourceColumns and DestinationColumns to each Transformation object.

To create a new Column, call the New method of the SourceColumns or DestinationColumns collection and pass it the Name property (Column name) and the Ordinal property (the Column's position in the rowset). Then, define the Column's characteristics. For illustrative purposes, the example defines only the fname Column; it uses the default characteristics to add all other Columns. In a real application, you clearly define each Column. Otherwise, inaccurate data type and null information can cause potential runtime errors and data integrity problems. The DataType and Flags properties correspond to the OLE DB data type (DBTYPE) and Column flags (DBCOLUMNFLAGS) identifiers. These values map the native provider data types to their equivalent OLE DB data types. The Size property determines the Column's maximum length. The Precision and NumericScale attributes define decimal and numeric data types. Last, the Nullable property signifies whether the Column allows null values. After you add the fname Column to the Transformation, add the remaining Columns involved in the CopyColumnTransformation. Next, add the CopyColumnTransformation to the Transformations collection. To transform the remaining Columns, add an ActiveXScriptTransformation.

Adding an ActiveXScriptTransformation

An ActiveXScriptTransformation is similar to a CopyColumnTransformation. You create an ActiveXScriptTransformation by passing it the class name DTS.DataPumpTransformScript. You initialize many of the properties in the same way as you do those in the CopyColumnTransformation. You add the SourceColumns and DestinationColumns in the same manner. Then, the ActiveXScriptTransformation adds the TransformServerProperties property to define the script and language to use. Assign the ActiveXScriptTransformation to perform to the TransformServerProperties (Text) method; the TransformServerProperties (Language) identifies the scripting language; and the TransformServerProperties (FunctionEntry) identifies the name of the function in the ActiveXScriptTransformation. You can choose to read the script from a file; rather than hardcoding it into the application.

The scripting language uses the DTSDestination and DTSSource objects to access the DestinationColumns and SourceColumns collections. Within the script, you have complete use of all the functionality that the scripting language offers, including the ability to use COM objects outside the DTS environment.

The creation of the PubsToHRMartTask is now complete, and executing the code will add it to the Package when this code is executed. Now, you need to create a DataDrivenQueryTask to move the data from the Northwind database into the HRMart database.

Adding a DataDrivenQueryTask

After transforming the data from the pubs database, you move and transform the data in the Northwind database to the HRMart database. A DataDrivenQueryTask is similar to a DataPumpTask. A DataPumpTask produces only INSERT transactions, but a DataDrivenQueryTask can produce other transaction types. From an implementation perspective, the DestinationColumns in a DataPumpTask's Transformation map directly to the DestinationObjectName. In a DataDrivenQueryTask, the DestinationColumns map to specific parameters in a parameterized query. The query then applies the appropriate changes to the destination data store. To create the DataDrivenQueryTask, the example uses the subroutine TaskNorthwindToHRMart, which you see in Listing 2, page 34.

To create the DataDrivenQueryTask, pass the class name DTSDataDrivenQueryTask to the New method of the Tasks collection. Many of the properties are the same as those involved in the DataPumpTask. New properties in the DataDrivenQueryTask are the InsertQuery, UpdateQuery, DeleteQuery, and UserQuery. Each of these attributes contains a single SQL statement that you can apply to the destination data store. For each row that the Task processes, it fires one of these statements. The status/result that the ActiveXScriptTransformation returns determines which query to execute. The example application generates only INSERT transactions. In addition, although these SQL statements typically take the form of a parameterized query, any valid SQL statement, including a stored procedure call, will work.

You add the SourceColumns, the DestinationColumns, and the ActiveXScriptTransformation in the example in the same manner as in the DataPumpTask. In the DataDrivenQueryTask, however, you need to carefully consider the value that the function returns. The return value triggers the execution of the appropriate parameterized query. The example sets the function to the DTSTransformStatus enumeration constant of DTSTransformStat_InsertQuery.

Also, you must define the parameters for each of the queries. These Column definitions provide the mappings between the Transformation's DestinationColumns and the Task's query parameters. The InsertQueryColumns, UpdateQueryColumns, DeleteQueryColumns, and UserQueryColumns collections contain these Column definitions. The example defines the Columns only for the InsertQueryColumns collection. You add a new Column by calling the New method of the InsertQueryColumns collection and passing it the appropriate DestinationColumn .Name and the numeric position of the corresponding parameter in the InsertQuery. The parameter order must be correct; otherwise, unexpected results can occur.

The creation of the NorthwindToHRMartTask is complete; you can add it to the Package. Before a Task can execute, a Step must reference it. You now need to add three Steps to the Package.

Adding Steps

After you've added all the tasks to the DTS package, the next action of the Main subroutine is to add Steps to the DTS package. A Step and its Constraints control the execution of a Task. In the example, the subroutine AddStep in Listing 3, page 35, creates, initializes, and adds Steps to the Package.

To create a Step, call the Steps collection's New method. To initialize a Step, assign it a name and tell it which Task to execute. The Name and TaskName properties handle these two assignments. In addition to these properties, others are available to control the Step's behavior. The AddGlobalVariables property determines whether the Package's GlobalVariables are visible to the Task. The CloseConnection property tells the Package whether to close the Connection after completing the Step's Task. If the Step will return a rowset, the IsPackageDSORowset attribute must be set to true; its default value is false. Also, by setting the DisableStep property you can enable or disable a Step.

In addition to Package-level transaction management, Steps offer the ability to control transactions at a more granular level through the JoinTransactionIfPresent, CommitSuccess, and RollbackFailure properties. JoinTransactionIfPresent specifies whether a Step executes within the scope of the Package's transaction. It can be enabled only if the Package.UseTransaction attribute is set to true. Then, the CommitSuccess property commits data on successful Step completion. Likewise, when this attribute is true, the RollbackFailure property performs a rollback if the Step fails.

To enhance performance, you can run Steps within their own worker threads or execute them within the main thread of the Package. Furthermore, you can control the Step's Win32 execution priority. The ExecuteInMainThread and RelativePriority properties handle this functionality. Although you can change the Step's priority, the priority class of the Packages constrains its relative priority. Use the enumeration constant's DTSStepRelativePriority when setting the RelativePriority.

The Step is ready to add to the Package. In the example, you add three Steps, one each to manage HRMartPurgeTask, PubsToHRMartTask, and NorthwindToHRMartTask. Now, add the Constraints to ensure the Steps' proper execution order.

Adding Constraints

The Main subroutine's last action is adding Constraints to the DTS package. A Package's PrecedenceConstraints control its workflow by defining the conditions that must exist before a Step can execute. The AddConstraint subroutine in Listing 4, page 35, creates and defines a Constraint.

To create a PrecedenceConstraint object, call the New method of the destination Step's PrecedenceConstraints collection and pass it the source Step's name. This is the Step object whose value determines whether to execute the destination Step. The Constraint object's PrecedenceBasis determines which value to examine. The PrecedenceBasis can be either the execution status of the source Step or its execution results. The Step performs parallel, serial, or conditional execution via the PrecedenceBasis settings.

In the example, the Package must successfully complete the HRMartPurgeStep before executing PubsToHRMartStep and NorthwindToHRMart. This requirement ensures that the target table, HRMart..Employee, will be empty and no duplicate key errors will occur. For each Constraint, set the PrecedenceBasis to be based on the execution result and set the Value property to success. Valid values for the PrecedenceBasis property come from the enumeration constant DTSStepPrecedenceBasis. Its default is DTSStepPrecedenceBasis_ExecResult. Depending on the PrecedenceBasis selected, the enumeration constant DTSStepExecResult or DTSStepExecStatus provides valid values for the Value property. Then, add the PrecedenceConstraint to the Step.

Putting the Pieces Together

This article and our April article give you the code necessary to fulfill the original requirement of combining the employee data from the two subsidiary companies into one data mart. Compile and execute the application. If all goes well, a message box appears stating that the execution was successful. If something fails, a different message box appears reporting the offending Step. Don't forget to add the VB project references to the Microsoft DTSPackage Object Library. Also, be sure to customize the SQL Server logon constants to match your environment.

In addition to executing the Package, you can save it to SQL Server. This ability is part of the Package object model. To enable this feature, remove the comment indicator on the line beginning with Call moPackage.SaveToSQLServer. From the DTS Package Designer, you can then open the Package and see a graphical view similar to Screen 1.

The DTS object model is extensive and gives developers a rich interface for custom applications. Some topics we've omitted include discussions on GlobalVariables, Lookups, DTS Event handling, CustomTask development, CustomTransformation development, and error handling. We will continue exploring these and other DTS programming techniques in future articles.

From the Blogs

The quest for the Golden Record to achieve a single, accurate and complete version of a customer record is worth the pursuit to attain survivorship. Record matching and consolidation are only the beginning. Melissa Data takes a new approach. Learn how to apply intelligent rules based on reference data to make smarter and better decisions for data cleansing....More

On SQL Servers where Availability Groups (or Mirroring) isn’t in play, I typically recommend keeping a combination of on-box backups along with copying said backups off-box as well. Obviously, keeping databases AND backups on the SAME server is the metaphorical equivalent of putting all of your eggs in one basket – and therefore something you should avoid like the plague....More

One of the biggest strengths of AlwaysOn Availability Groups is that they allow DBAs to address both high availability and disaster recovery concerns from a single set of tooling or interfaces. But, this doesn’t mean that you won’t still need backups....More