cehagema's solution is the "inner join" construction that I mentioned, if the multi-column predicate is not supported.

My guess though -- and I haven't tested or researched this for SQL Express/T-SQL -- is that the inner join will be much slower than the IN construction. Here's why I think this.

For explanation purposes, I will assume there are 100 groups (100 distinct values of groupNum).

With an inner join that uses a correlated subquery ("x.grpno = j.grpno"), the subquery is executed for each row in the outer query. Since the outer query has no filtering predicates other than the subquery itself, this means that the subquery will be executed 1 million times. The subquery is an aggregation on a filtered subset of the 1 million row table, but the filtering is unknown until the outer query's row is obtained. Thus, the potential is that each time the subquery is called, somewhere between 1 and 1 million rows will be gathered and aggregated. If we assume an even distribution, there are 10,000 rows in each group (1 million rows divided by 100 groups). That means I will process 10,000 rows (on average) 1 million times -- a really large number.

Of course, this assumes that the table is indexed on groupNum. If not, a full table scan may be needed, and the processing becomes 1 million rows times 1 million rows -- an even larger number.

By comparison, the IN construction does not use a correlated subquery. Instead, it uses an independent (non-correlated) subset. In a single pass of the 1 million rows, it will prepare a reduced subset of 100 rows (1 row per group). That subset is then applied against each of the 1 million rows in the upper query as a filtering predicate. In the worst case, this means that I process 1 million rows for the subset plus (1 million rows times 100 rows) -- a much smaller number by a couple of orders of magnitude.

The subset of 100 rows is small enough to fit in memory as an ordered list of values. By holding the subset in memory, the burden of matching the 1 million rows against this filtering predicate is considerably less demanding -- that is, faster. So, even if the number of matches is 100 million, the difference between retrieving the rows and filtering the rows is very small.

I hope the RANK() behavior follows the non-correlated subset model, and not the inner join correlated subquery model.

As I'm working on the Syntax of the Rank() solution. I have worked with Subqueries in the past and understand the logic (at least it's working). I'm still new with the Rank solution and trying to understand.

In the world of databases, usually, the fastest solution is the best answer. Using RANK() OVER (PARTITION BY GROUP ORDER BY DOB DESC) is fastest in that it is able to return the desire result with less reads. RANK() assigns an integer value to the results base on the column(s) referenced in the ORDER BY clause.

(1) it must return the correct answer on the target platform
(2) it should use standard syntax and construct
(3) it should be easy to understand by someone who didn't write it
(4) it should be as fast as necessary... it needn't be as fast as possible
(5) it should play well with others in a multi-user environment
(6) it must be ready-to-run on time and under budget

The "best" answer is the one which blends all of these. Notice that only two of these are "must", the others are "should".

Select row,grpno,grpid,name,MIN(BirthDate) youngestbirthdate into #final
from(
Select ROW_NUMBER() over (Partition By grpno Order by birthDate) as row,*
from @tab)
as A
Group by row,grpno,grpid,name
Having row = 1