Analyzing a Query from the Inside Out

Our organization's database has three related tables: Category, Products, and Transactions. All of the database transactions have a ProductID that's associated with the Products table. The Products table in turn has a CategoryID that's associated with the Category table. Table 1 shows the tables' relationships to each other.

I want to use a T-SQL SELECT statement to pass the variable @customerID and return a list of all category names and a value of the most recent transaction associated with each category, if one exists, sorted by the specified customer ID. I can use the TransactionID to determine which transaction is the most recent. What's the most efficient T-SQL SELECT statement I can use to return the specific information I want?

This question is complex because no direct relationship exists between the CategoryName and the CustomerID values. The values are linked only through their relationships with the Products table. So, logically, you must expand the Transactions table to include the category information. You can then select the most recent value for the subset of the data relating to the customer in question. Listing 1 shows a T-SQL SELECT statement that might return the list of category names you want. Let's break down Listing 1 into its parts so that you can understand what each section is doing.

At callout C in Listing 1, the code matches categories to products (the first join). Then, for each match, the code asks for matching transactions for that customer. The code uses an outer join to retain product information if it doesn't find any matching transactions.

After the code obtains the matching transactions, it performs the SELECT statement that callout B shows and returns the most recent transaction for every product in every category. Next, the code looks for the most recent transaction in each category, as the SELECT statement at callout A shows. The code disregards the individual products because you don't need them in the list of category names. You needed the products information only to make the connection from category to transaction and customer. Finally, the code searches the Transactions table to obtain the value of the most recent transaction per category.

From the Blogs

Many organizations today cannot use public cloud solutions because of security concerns, administrative challenges and functional limitations. However, they still need a centralized platform where end users can conduct self-service analytics in an IT-enabled environment....More

It is crucial to move away from data and analytics stored on individual desktop computers. Today’s solutions must promote holistic, collective intelligence. The strong, continued alliance between Microsoft and Pyramid Analytics helps make all this possible....More

To become a truly data-driven enterprise, many business leaders recognize that they must extend the capabilities of self-service business intelligence (BI) and analytics to more of their business users. Many BI tools tackle part of this need, but they don’t offer a complete enterprise solution....More