Mode Puzzle Solutions

On September 27, 2007 I provided a puzzle where you were asked to calculate the Mode of the distribution (the value that occurs most frequently). You can find the details of the puzzle here. I got solutions to the puzzle from Alejandro Mesa, Steve Kass, Umachandar Jayachandran, Michael Mac, Dieter Noeth, paPai and rlahoty. Here I’ll describe a sample of the solutions which I find most elegant. Most of the solutions would benefit from an index on the columns CustomerID, EmployeeID:

Several people solved the two variations of the puzzle by using ranking calculations and a table expression. This was also the approach I used in my first attempt at solving the puzzle:

Solution based on ranking, Variation 1

WITH C AS ( SELECT CustomerID, EmployeeID, COUNT(*) AS Cnt, RANK() OVER( PARTITION BY CustomerID ORDER BY COUNT(*) DESC) AS Rnk FROM dbo.Orders GROUP BY CustomerID, EmployeeID ) SELECT CustomerID, EmployeeID, Cnt FROM C WHERE Rnk = 1;

The code groups the orders by customer and employee calculating the count of orders for each group. The RANK function is used to rank the employees based on their order count in descending order within the customer (PARTITION BY CustomerID). Remember that the RANK function will assign the same rank value to all employees with the same order count, and the highest will be assigned with a rank of 1. The rest is just a matter of defining a table expression based on the query that calculates the ranking values and filtering only the rows with a rank value of 1.

Solution based on ranking, Variation 2

WITH C AS ( SELECT CustomerID, EmployeeID, COUNT(*) AS Cnt, ROW_NUMBER() OVER( PARTITION BY CustomerID ORDER BY COUNT(*) DESC, EmployeeID DESC) AS RowNum FROM dbo.Orders GROUP BY CustomerID, EmployeeID ) SELECT CustomerID, EmployeeID, Cnt FROM C WHERE RowNum = 1;

To solve Variation 2 of the puzzle you need to make a minor change to the previous solution—use the ROW_NUMBER function instead of RANK and add EmployeeID DESC to the ranking calculation’s ORDER BY clause. In case of ties, the employee with the highest EmployeeID will be chosen. BTW, it is not necessary to use ROW_NUMBER instead of RANK since both are logically equivalent when the ORDER BY list is unique. I just think that it is more natural to use ROW_NUMBER in case I’m after only one row per customer.

Steve Kass also provided very elegant solutions that avoid the need to use a table expression. He achieved this by using TOP (1) WITH TIES and specifying the ranking calculation in the query’s ORDER BY clause. Here are his solutions to the two variations of the puzzle:

Solution based on ranking and TOP 1 WITH TIES, Variation 1

SELECT TOP (1) WITH TIES CustomerID, EmployeeID, COUNT(*) AS Cnt FROM dbo.Orders GROUP BY CustomerID, EmployeeID ORDER BY RANK() OVER( PARTITION BY CustomerID ORDER BY COUNT(EmployeeID) DESC);

Solution based on ranking and TOP 1 WITH TIES, Variation 2

SELECT TOP (1) WITH TIES CustomerID, EmployeeID, COUNT(*) AS Cnt FROM dbo.Orders GROUP BY CustomerID, EmployeeID ORDER BY ROW_NUMBER() OVER( PARTITION BY CustomerID ORDER BY COUNT(EmployeeID) DESC, EmployeeID DESC);

Another approach that several people used in their solutions was to query the Customers table and use the APPLY operator to apply a table expression to each customer. The table expression uses a TOP (1) query to pull the employee that occurs most frequently. Here are the solutions to the two variations of the puzzle based on the APPLY operator:

Solution based on APPLY, Variation 1

SELECT C.CustomerID, D.EmployeeID, D.Cnt FROM dbo.Customers AS C CROSS APPLY (SELECT TOP (1) WITH TIES EmployeeID, COUNT(*) AS Cnt FROM dbo.Orders AS O WHERE O.CustomerID = C.CustomerID GROUP BY EmployeeID ORDER BY Cnt DESC) AS D;

Solution based on APPLY, Variation 2

SELECT C.CustomerID, D.EmployeeID, D.Cnt FROM dbo.Customers AS C CROSS APPLY (SELECT TOP (1) EmployeeID, COUNT(*) AS Cnt FROM dbo.Orders AS O WHERE O.CustomerID = C.CustomerID GROUP BY EmployeeID ORDER BY Cnt DESC, EmployeeID DESC) AS D;

As you can see, the first solution specifies only the order count in the ORDER BY clause and uses the WITH TIES option in order to return multiple employees in case of ties. The second solution specifies both the order count and employee id in the ORDER BY clause and does not use the WITH TIES option since only one row is required per customer.

The previous solutions are elegant and fairly simple in their logic. But if performance is of the highest priority, there is a solution that is a bit more complicated, but is faster than all others. This solution will only work for the second variation of the puzzle where you are after only one employee per customer. The solution first groups the data by customer and employee and calculates the order count for each such group. The solution then groups the result by customer, concatenates the order count and the employee id (call it CntEmpID), and calculates the MAX CntEmpID value. You can use various techniques to concatenate the values. The technique I used is to multiply the order count by 2147483648 (max supported four-byte integer plus 1) and adding the employee id. Once the max value is calculated (call it MX), you can extract the order count by dividing MX by 2147483648 using integer division, and the employee id with a modulo (MX % 2147483648). Here’s the complete solution:

Solution based on concatenation, Variation 2

SELECT CustomerID, MX % CAST(2147483648 AS BIGINT) AS EmployeeID, MX / CAST(2147483648 AS BIGINT) AS Cnt FROM (SELECT CustomerID, MAX(CntEmpID) AS MX FROM (SELECT CustomerID, CAST(2147483648 AS BIGINT) * COUNT(*) + EmployeeID AS CntEmpID FROM dbo.Orders GROUP BY CustomerID, EmployeeID) AS D1 GROUP BY CustomerID) AS D2;

From the Blogs

My initial goal in writing this series of posts was to outline some of the concerns surrounding Availability Groups (AGs) and SQL Server Agent Jobs – and call out how there is virtually no guidance from Microsoft on this front and then detail some of the pitfalls and options available for tackling this problem domain. I initially expected this series of posts to have between 25 and 30 posts – according to some of the early outlines I created ‘way back when’....More

Throughout this series of posts I’ve taken a somewhat pessimistic view of how SQL Server Agent jobs are managed within most organizations – meaning that most of the code and examples I’ve provided up until this point were based on assumptions about how CHANGE to jobs is managed. That pessimism, to date, has come in two forms:...More

In this series of posts I’ve called out some of the concerns related to SQL Server AlwaysOn Availability Groups and their interaction with SQL Server Agent jobs – both in the form of Batch Jobs (see post #3) and backups....More