I disliked his solution initially, as I embrace ORM more and more, I'm of the opinion that we should write portable code, and what's more portable than writing the query with constructs that exists on both SQL and ORM? I disliked the CASE WHEN approach to the problem as it don't have any analogous construct on ORM, Linq in particular.

With that in mind, I tried to dissuade the user from using the CASE WHEN approach. But we need to wise-up though, portability is not a panacea that can magically make our applications have more customers. How can a customer be happy if bad application performance spoils the fun on using your application. With this in mind, I tried to profile both queries to see which one is faster. I'm expecting the OR to be slower(RDBMS don't do short-circuit, it uses cost-based analysis on determining how it should perform the query) than CASE WHEN, but I'm not expecting very big difference in performance, then much to my chagrin when I see a staggering difference in performance, the CASE WHEN approach can finish the job in 88 milliseconds, while the OR approach took 4.7 seconds. The performance of OR is not acceptable

CASE WHEN approach's Query Cost is 3% only, while the OR approach is 97%. So there goes our portability ideals, it's not ideal. We need to tap the strengths of a given tool in order to get our job done in an efficient manner. I will not be dead set anymore to make a given code construct be very compatible on another platform or API.

Wednesday, July 11, 2012

In this article, I will show you how to get the longest attendance streak a given student had achieved. This will also take the weekends and holidays into consideration.

Given a student has this attendance:

July 3 Tuesday
July 4 Holiday -- Independence Day
July 5 Thursday
July 6 Friday
July 9 Monday
July 12 Thursday
July 13 Friday

The query should show the student had achieved 4 consecutive attendance on these dates:

July 3 Tuesday
July 5 Thursday
July 6 Friday
July 9 Monday

July 4 is a holiday and should not affect or penalize the student's attendance performance, the query should show the student has good four consecutive attendance from July 3 to July 9; likewise, the weekends(Saturday and Sunday) should not have an effect on attendance performance.

Student gets an attendance performance of 11 from July 18 to August 1 when she attend classes on following dates:

July 18 Wednesday
July 19 Thursday
July 20 Friday
July 23 Monday
July 24 Tuesday
July 25 Wednesday
July 26 Thursday
July 27 Friday
July 30 Monday
July 31 Tuesday
August 1 Wednesday
August 3 Friday

How to determine if the given dates are consecutive? This is a solved problem, and it has a terminology to it, it's called islands and gaps algorithm.

The core thing with islands and gaps problem is we must have a mechanism to group consecutive rows so in the final query we can count them. One mechanism is to use row_number windowing function, though row numbering approach will fail if there's duplicate rows, row_number would suffice if the rows are unique, e.g. attendance.

So given this numbers, we have 5 consecutive numbers(1 to 5) and another 3 consecutive numbers(8 to 10):

1
2
3
4
5
8
9
10
12

With row_number we can determine the consecutive-ness of those numbers

N R#
1 1
2 2
3 3
4 4
5 5
8 6
9 7
10 8
12 9

Subtract row number from N, this will give us related rows

N R# GRP
1 1 0
2 2 0
3 3 0
4 4 0
5 5 0
8 6 2
9 7 2
10 8 2
12 9 3

Given the GRP column, we can do this query to count the longest attendance streak a given student has:

with a as
(
select n, n - row_number() over(order by n) as grp
from tbl
)
select
*
,dense_rank() over(order by grp) as nth_streak
,count(*) over(partition by grp) as streak
from a;

Now, for the final hurdle to tackle, if 5 is Friday, 6 and 7 are Saturday and Sunday respectively. So on the above data, the student should have an 8 consecutive attendance instead of separate 5 and 3. To do that, we need to shift all the date if it is over 7.

Notice that the value Monday's value(8) shifted to 6, likewise with Tuesday(9, this shifted to 7) and so on.

With this logic at hand, we can now easily determine the consecutive-ness of adjacent rows. We just need to subtract row_number from SHIFT_UP in order to make the current Monday consecutive to last week's Friday. This is the query:

with a as
(
select n, n - ((n/7) * 2) - row_number()over(order by n) as grp
from tbl
)
select
*
,dense_rank() over(order by grp) as nth_streak
,count(*) over(partition by grp) as streak
from a;

It now yields the date from last week's Monday to Friday(1 to 5) and this week's Monday to Friday(8 to 10) as consecutive 8 days(indicated by GRP=0). Hence yielding a 5+3 consecutive attendance, this yields 8 consecutive attendance.

Another way to visualize how the date are being shifted(to determine date consecutive-ness) two days back:

If a given date falls on 15th, this will be computed as 15/7 * 2 == 4; then subtract 4 from the original number, 15 - 4 == 11. 15 will become the 11th day. Likewise the 8th day becomes the 6th day; 8 - (8/7 * 2) == 6. The above data yields the following output when the computation is applied:

1 2 3 4 5
6 7 8 9 10
11

The above attendance yields 11 consecutive good attendance. For holidays, you need to slot them on attendance, so the consecutive-ness of dates could be easily determined, then just remove them from the final query.

To apply the above logic on determining consecutive attendance even there are holidays and weekends, use this:

There's a discrepancy(the last queries, z column) when using datepart on obtaining first day of week. On countries(e.g. USA) that start their calendar week with Sunday, the first day of week for July 4, 2012 is July 1, 2012(Sunday). For countries that start their calendar week with Monday, the first day of week for July 4, 2012 is July 2, 2012(Monday).

To avoid this discrepancy, instead of using date functions(e.g. DATEPART) that has a dependency on @@DATEFIRST settings, use date functions that doesn't rely on @@DATEFIRST settings e.g. DATEDIFF