Mastering DTS

Too often, people consider extraction, transformation, and loading (ETL) only in their data-warehousing environments. They completely overlook that the same ETL concepts and practices used with data warehousing also apply to data migration from legacy systems as well as ongoing data import and export processing between online transaction processing (OLTP) systems. For database developers and DBAs, the concept of ETL processing shouldn't be limited to data warehousing. As you consider how to expand your career horizons, having a solid understanding of ETL concepts, design patterns, and implementation options will give you added leverage and expertise beyond the simple database-management skills that are essential to every organization. I can't recall a single project I've been involved with that hasn't required some type of ETL processing—and most of those systems had nothing to do with data warehousing.

So how do you get started on mastering ETL concepts and implementation options? With SQL Server, it's easy—look no further than Data Transformation Services. DTS provides a framework and tool set that can help jump-start your ETL career. Using DTS's graphical Package Designer and wizards, you can quickly create simple data-movement routines. As your experience and the complexity of your requirements grow, you can use DTS's programmatic COM interfaces to extend and expand DTS's tool set to meet most needs. Microsoft greatly improved the DTS documentation in SQL Server 2000 Books Online (BOL) from the previous version. If you require more than BOL, many DTS articles are available from the SQL Server Magazine Web site (http://www.sqlmag.com/articles/index.cfm?topicid=783). In addition, several recent books can help guide you from beginning through advanced topics. (For more information about career resources, see Morris Lewis, "Learning for Life," page 27.)

No matter where you are in your DTS and ETL learning curve, keep one important consideration in mind. Designing and developing ETL processes require many of the same skills that application designers and developers use. These skills include combining the traditional set-based approach to data access with procedural programming and error handling. You need to program and design defensively, assuming that error conditions are the rule rather than the exception.

From a DTS implementation standpoint, you'll need to become familiar with COM-based programming. The Package Designer offers much in the way of point-and-click functionality, but most production-quality ETL processing implementations require some level of programming beyond simply setting attributes at design time. This programming might be as simple as adding a couple of ActiveX transformations or as complex as implementing Custom Tasks in Visual Basic (VB) or Custom Transformations in C++. In any case, you'll need to understand how to interact in a COM environment because DTS itself is implemented as a series of COM interfaces.

Given DTS's strong ties to the application-development world, you also need to consider the effect of Microsoft's .NET tool set. Microsoft has already announced that the inclusion of .NET's Common Language Runtime (CLR) in the next release of SQL Server (code-named Yukon) will provide an alternative to T-SQL for database programming. Consequently, I believe that DTS will fully support a .NET flavor of programming as well. Developers who work with DTS will need to explore what .NET means to their existing DTS implementations, but it's certain that we'll need to adapt our methodologies to support this new environment.

Good database developers also possess a thorough understanding of the application-development tool sets and environments that use their data. Using DTS to explore the ETL world gives you a channel to expand your skills into application development while still maintaining close ties to the database.

From the Blogs

My initial goal in writing this series of posts was to outline some of the concerns surrounding Availability Groups (AGs) and SQL Server Agent Jobs – and call out how there is virtually no guidance from Microsoft on this front and then detail some of the pitfalls and options available for tackling this problem domain. I initially expected this series of posts to have between 25 and 30 posts – according to some of the early outlines I created ‘way back when’....More

Throughout this series of posts I’ve taken a somewhat pessimistic view of how SQL Server Agent jobs are managed within most organizations – meaning that most of the code and examples I’ve provided up until this point were based on assumptions about how CHANGE to jobs is managed. That pessimism, to date, has come in two forms:...More

In this series of posts I’ve called out some of the concerns related to SQL Server AlwaysOn Availability Groups and their interaction with SQL Server Agent jobs – both in the form of Batch Jobs (see post #3) and backups....More