Solution 2 – MAX and DENSE_RANK

SELECT
s.city,
MAX(s.start_date) KEEP (DENSE_RANK FIRST ORDER BY sales DESC) AS start_date
FROM sales_volume s
GROUP BY s.city
ORDER BY s.city;

Now, the MAX function is used to find the value of the highest record. It’s used as an analytical function by adding the KEEP clause at the end.

The KEEP clause, along with DENSE RANK FIRST tells the function to keep the first value that it finds when looking for the sales value. But, the ORDER BY start_date DESC part tells the function to order the fields by the start_date in descending order. This tells the MAX function to find the row that matches the highest start_date value.

However, the drawback with this solution is that it does not show the entire record.

Solution 3 – Correlated Subquery

Another way to use SQL to select the row with the highest value is to use MAX with a correlated subquery.

This query may not be very efficient, as it uses this correlated subquery. But, we will see what happens when we do the performance test at the end of this post.

Solution 4 – MAX and Partition By

The last solution that I’ll be demonstrating for using SQL to select a max row is this one, which uses the MAX function and the PARTITION BY clause.

SELECT
city,
start_date,
sales
FROM (
SELECT
city,
start_date,
sales,
MAX(sales) OVER (PARTITION BY city) AS max_sales
FROM sales_volume)
WHERE sales = max_sales
ORDER BY city;

This uses the MAX function as an analytical query. It uses an inline view (having an SQL query as the FROM clause). Within this view, it adds an extra column which is the maximum start_date value which is partitioned by city. This field has the same value for each city, which is the maximum.

Then, outside this view, in the WHERE clause, it only shows those where the start_date value is the maximum.

Performance Comparison

Now that we have a few different solutions, all showing the correct records, let’s compare the performance of each of them.

The table that I used only had a small number of records. Let’s expand this with some random data up to 1 million records and see how it performs.

I’ll insert the extra records, add an index on the city column and the sales, show the actual run times for each, and check the EXPLAIN PLAN for each solution.

One thing that concerns me is the cardinality estimate on Solution 4. I’m not expecting 1 million unique records to be returned, but we’ll see what happens when we run the query.

Now, after running each of the queries, I get the following information:

Solution

Run Time (Sec)

Rows

1

793.724

2726

2

0.462

286

3

0.504

2726

4

2.01

2726

You can see here that both Solution 2 and 3 took less than a second to run. Solution 4 took 2 seconds to run. Solution 1 took 793 seconds to run (13 minutes and 13 seconds), which was by far the longest.

Solution 1, 3 and 4 have 2726 records, which is correct because there are multiple occurrences of cities and start_dates with the same value.

Solution 2, however, had much less, as it does not show the full record.

Conclusion

So, in conclusion:

There are many ways to write this query to get the solution you need.

They each have a different performance.

Make sure you create the right indexes on these fields.

For the data I have, I would probably use solution 3, but you should test these solutions against your own data to see which performs better.

Lastly, if you enjoy the information and career advice I’ve been providing, sign up to my newsletter below to stay up-to-date on my articles. You’ll also receive a fantastic bonus. Thanks!