ETL Best Practices

Most traditional ETL processes perform their loads using three distinct and serial processes: extraction, followed by transformation, and finally a load to the destination. However, for some large or complex loads, using ETL staging tables can make for better performance and less complexity. As part of my continuing series on ETL Best Practices, in this post I will some advice…

In the last post in my ongoing series about ETL best practices, I discussed the importance of error handling in ETL processes, reviewing best practices for application flow to prevent or gracefully recover from a systematic error or data anomaly. In this post, I’ll dig a bit further into that topic to explore the design patterns for managing bad data in ETL…

In my ongoing series on ETL Best Practices, I am illustrating a collection of extract-transform-load design patterns that have proven to be highly effective. In the interest of comprehensive coverage on the topic, I am adding to the list an introductory prequel to address the fundamental question: What is ETL? What Is ETL? ETL is shorthand for the extraction, transformation,…

In designing a proper ETL architecture, there are two key questions that must be answered. The first is, “What should this process do?” Defining the data start and end points, transformations, filtering, and other steps must be done before any other work can proceed. The second question that must be answered is “What should happen when the process fails?” Too…

I still remember the first real ETL process I developed. I was working for a hospital at the time, going through a major system implementation as we replaced a 17-year-old UNIX-based system with a more modern healthcare application suite with a SQL Server back end. I was tasked with building, testing, and executing the ETL processes for this conversion. While…

Imagine for a moment that you’ve built a software thing. In fact, we’ll call it The Thing. You put a lot of work into The Thing, and it does exactly what you wanted it to. You put The Thing into play as part of a larger solution and, after a couple of revisions, its behavior is verified and it is…

Before I began my technical career over a decade and a half ago, I spent several years working in law enforcement. In that field, one of the things one must learn quickly is the concept of the chain of custody of evidence. There were numerous procedures we had to follow to ensure that evidence was not just gathered and preserved,…

It happens far too often: Once an ETL process has been tested and executes successfully, there are no further checks to ensure that the operation actually did what it was supposed to do. Sometimes it takes a day, other times it takes a year, but eventually that call comes from a client, coworker, or boss: “What’s wrong with this data?”…

If you were to poll data professionals on which tasks they enjoy working on the most, ETL logging would probably not make the list. However, it is essential to the success of any ETL architecture to establish an appropriate logging strategy. I like to compare a good logging infrastructure to the plumbing of a house: it is not outwardly visible,…