This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

2. Using a subquery that retrieves information from the “next” row (Regan, Colleen)

3. Using a function that returns the “next” row and a recursive query (Marc)

4. Matching adjacent rows using row numbers (Jermy, Itzik 2)

Now for the details…

1. Identifying islands (Itzik 1 and Peso)

The key part of the solution is to calculate two row numbers that are partitioned by travelerid—one ordered by travelstart, and the other ordered by destination, travelstart. The difference between the two is constant and unique per traveler and destination. You can then group the rows by traveler, destination and that diff, calculate the difference between the maximum miles traveled and minimum miles traveled, and this way get the total miles traveled in each leg of the itinerary. Finally, group the result by traveler and calculate the total miles traveled in all legs.

Here’s my solution implementing this approach along with performance measures:

-- Itzik 1

-- CPU time = 9501 ms, elapsed time = 8554 ms, logical reads 4978

WITH C1 AS

(

SELECT travelerid, destination, milestraveled,

ROW_NUMBER() OVER(PARTITION BY travelerid

ORDER BY travelstart) -

ROW_NUMBER() OVER(PARTITION BY travelerid

ORDER BY destination, travelstart) AS grp

FROM dbo.Travel

),

C2 AS

(

SELECT travelerid, destination,

MAX(milestraveled) - MIN(milestraveled) AS miles

FROM C1

GROUP BY travelerid, destination, grp

)

SELECT travelerid, destination, SUM(miles) AS totalmiles

FROM C2

GROUPBY travelerid, destination;

Peso implemented a very similar idea, only in the second row number he specified the destination attribute as a partitioning (as opposed to ordering) element. The effect is the same, though. Here’s Peso’s solution:

-- Peso

-- CPU time = 10950 ms, elapsed time = 8347 ms, logical reads 4978

WITH cteTravel(TravelerID, Destination, TotalMiles)

AS

(

SELECT TravelerID,

Destination,

MAX(MilesTraveled) - MIN(MilesTraveled) AS TotalMiles

FROM (

SELECT TravelerID,

Destination,

MilesTraveled,

ROW_NUMBER() OVER (PARTITION BY TravelerID ORDER BY TravelStart) AS Yak,

ROW_NUMBER() OVER (PARTITION BY TravelerID, Destination ORDER BY TravelStart) AS Yik

FROM dbo.Travel

) AS x

GROUPBY TravelerID,

Destination,

Yak - Yik

)

SELECT TravelerID,

Destination,

SUM(TotalMiles) AS TotalMiles

FROM cteTravel

GROUPBY TravelerID,

Destination;

The benefit in these solutions is that the data is scanned only once. One of the row number calculations can benefit from an ordered scan of an index. However, there’s no avoiding of one sort operation in the plan.

2. Using a subquery that retrieves information from the “next” row (Regan, Colleen)

The solutions in this category use a subquery retrieving the interesting information (milestraveled) from the “next” row; that is, in respect to the current row, the first row with the same travelerid value, a greater travelstart value, based on travelstart ordering, provided that the destination is the same. Regan implemented this approach using APPLY and a TOP query, like so: