Here's a sticky problem. How do you build a query that gives you distinct records from one table based on multiple records from another table and order by a date found in the second table. In my real world example, I have project tracks in "Items" and comments or notes in "events". I want a distinct list of "items" that have been updated in the past 10 days. That seems easy right? Well... not as easy as it seems on the surface.

The Problem

Let's say in a project has been "commented on" ("events" table) several times during the 10 days. If I do something like:

The DB will complain that I cannot order by a column that is not in the SELECT list when using "DISTINCT". If I put the column in the select list I will get duplicates.

Real World Example

Let's a look at the problem a bit closer. Items contains the master record for a particular track. For example, Items are related to individual bugs, feature requests, work orders, or change requests. For example, the Items table might look like:

ItemIdDescription
17 Wash Your Wife
22 Take out the dry cleaning
4 Pick up the dog

What I want is to remove that third line. I already "know" that itemid 17 has been updated recently. I don't need to "know" that it was updated previously - even if it is inside my 10 day window. Sure, I can simply keep track of the itemIds in my loop and not display the ones that have already been handled. But surely there is another way? Here's the solution:

Correlated subqueries should be avoided as much as possible because they are usually very slow (especially for large data sets). An alternative would be to run an inline view to get the top 1 event id and item id, and then joining this result set to your Events table.

I know that looks heinous, with a join via the HAVING clause, but if you run it you'll see that it works.

Think of it this way: you need the Events table twice, once aggregated (to find the max date) and once unaggregated (to find the EventID that goes with that max date). The "e" alias is the aggregated version, hence why none of its columns show up in the GROUP BY clause. The "f" alias is the unaggregated version, so you can then tie back to the "e" table. You can't join in the ON clause like you normally would, because that happens before the aggregation, so you have to join in the HAVING clause, which happens after.

If your event dates really are just days and not full (unique) timestamps, you need to wrap an aggregating function like MIN or MAX around the f.EventID in the SELECT list and take it out of the GROUP BY list. This will pick just one of the Events from that same day and you won't get duplicate rows. You would still leave the f.Created in the GROUP BY, however.

Whoa... that is twisted :) I think there is a future for you in writing the US tax code. That's an excellent and smart solution - and I bet it performs better than the sub select. Thanks for the input!