There’s a saying in the data world: garbage in, garbage out. We use that when we can’t get good information from our database because the data we’ve stored isn’t as useful as we would like. That’s a problem, and it’s one reason why data professionals want to spend time thinking about the data we need to collect and how to store it. We want to be sure that we’ve at least made an effort to collect useful data that someone will use.

We sometimes have the data we need, but still struggle to use it effectively. I think this is an area where machine learning and similar technologies may help in the future, but there is a lot of work to be done to allow most of us to take advantage of those tools. In the meantime, many of us make do with basic T-SQL to perform data analysis, generate reports, and provide the answers to questions. When we do so, it’s important that our queries actually work correctly to answer the questions we need.

I’ve run across quite a few stories like this from various consultants that were called in to help organizations, only to find out the queries that had been used for long periods of time were incorrect. They didn’t filter appropriately, didn’t convert or aggregate data as intended, or didn’t even query the correct data.

We use databases and queries extensively in today’s world, and the growth is only going to increase. As much as I like the idea of DevOps and more frequent deployments, I also want higher quality for our software. This means that we need to ensure that our queries actually work as intended against databases. Code reviews, independent checks, using known data sets that evolve and include edge cases of data are all ways we can work to ensure we are actually writing the correct queries for our data. Above all, we need to be sure we are using test of some sort, preferably unit tests, that ensure the queries are actually the ones we want.