With the chaos of the Worldmaps to Azure migration winding down, I decided to take a look at some the low-pri issues on my to-do list. Those close to the project or who follow their leaderboard ranking closely know that last year, the leaderboard was ranked by Total Hits. This worked pretty well at the time, but it made it impossible for those who joined the site weeks or months after others to compete unless their site had considerable more volume to close the gap. This year, I changed the formula to base the data off of average hits per day. This adds a bit more excitement because it’s a level playfield regardless of join date, for the most part. The one remaining problem I have with hits per day is that if a site gains popularity over time, the resulting average encourages the user to delete and recreate their map for better rankings. At some point in time, I’ll further refine this so it looks at hits per day for the last month or two (for example). In the meantime, though, I got some heat from Chris Eargle who pointed out some problems with my implementation when he looked at his numbers. The way the implementation worked was that it grabbed the earliest hit, the most recent hit, and divided by the number of days. To prevent division by zero and give a minimum of 1 day, 1 was added to the number of days like so: coalesce(sum(hits.NumHits) / (DATEDIFF(d, MIN(hits.CreateDate), MAX(hits.ModifyDate))+1) ,0) as HitsPerDay
While this worked “okay,” it wasn’t perfect. For starters, using DATEDIFF returns an integer based on the dates passed in. If you pass in “1/21/2010 23:59” and “1/22/2010 00:01,” the DATEDIFF (by days) is 1, even though only 2 minutes have passed. If you pass in “1/22/2010 00:01” and “1/22/2010 23:59,” a zero is returned even though it’s just 2 minutes short of a day. This isn’t wrong, but it’s limited for what we need because the leaderboard is generated throughout the day.
Adding 1 can be incorrect because, using the first example, it will now return 2 days (!) when only 2 minutes have expired. It’s a pretty extreme example but possible. I wanted an implementation that, using T-SQL alone, would be a bit more consistent. The criteria I had was: 1) had to support partial days and 2) had to give a minimum of 1 day. #2 is because I don’t want a map that was just created (as in the first time example) to end up getting (again, for example) 50 hits in 2 minutes and due to the math, getting averaged at say 100,000 hits per day. Using a minimum of 1 day makes the leaderboard a little more stable. The resulting query:
CAST(FLOOR(
SUM(NumHits)/
CASE
WHEN (DATEDIFF(hh, MIN(hits.CreateDate), MAX(hits.ModifyDate))/cast(24 as float)) < 1 THEN 1
ELSE (DATEDIFF(hh, MIN(hits.CreateDate), MAX(hits.ModifyDate))/cast(24 as float))
END
)
as int) as HitsPerDay
This works out really well. First, we use hours instead of days and simply divide by 24. It’s essentially the same as days, but gives a fractional value. If the value is less than 1, we’ll use just 1 as the minimum. This solves the second criteria above and also any division by zero cases. The ELSE of the CASE gives us the fractional days, so hits per day is more accurate and no more +1 tomfoolery. Even though the FLOOR returns an whole number, the data type is unchanged so a cast to an int is necessary for the application.
If we look at the results in query analyzer, you can see the results are more true:
The “days” and “days_partial” shows the difference between using hours and days in DATEDIFF, giving us more accuracy. The “HitsPerDay_New” compared to “HitsPerDay_Original” shows the more accurate result. In some cases unchanged, in some cases much more accurate as you can see by looking at the “FirstHit” and “MostRecentHit” columns.
Anyway, just some fun playing with T-SQL and if you see some adjustments in the Leaderboard, this is why.
But Chris, it won’t help you get into the Top 10. :)