Efficiently query the DATE in DATETIME

The DATETIME data type is often misunderstood and used inefficiently. This article focuses on the date component of DATETIME, how it is handled internally and how it can be used effectively for querying. The DATETIME type is internally stored as two separate 4-byte integers: one of those integers stores the date portion, and the other the time. When the date portion has a value of 0, the date is 1900-01-01. Because the date is internally stored as an INT, casting and converting directly between the types is natural:Note: Because we are focusing on the date portion, we can ignore the time components for the purposes of this article. All the times will be midnight, but I won’t reprint them below. Additionally, the techniques demonstrated within can be applied to SMALLDATETIME and DATETIME2 (SQL 2008).

You can use cast and convert for either direction, there is no restriction. Note in the second example, I stored the date in a variable first, because there is no way to natively pass a DATETIME type into the query window; SQL Server must always convert it from a string. By taking the intermediary step of storing it into a variable, we ensure that the Database Engine truly understands the value as a DATETIME and not a VARCHAR. If you had attempted to convert the VARCHAR value directly, you would receive the following error:

Manipulating Dates

DATEADD

According to MSDN, DATEADD returns a specified date with the specified number interval (signed integer) added to a specified datepart of that date. The function prototype looks something like this:

DATEADD (datepart, number, date)

Because any date can be referenced as a simple integer, adding dates becomes trivial with the DATEADD function. Let’s compare adding days with both as a DATETIME and as an INT:

SELECT DATEADD(day, 1, '1900-1-1')
--Result: 1900-01-02

And compare that with adding 1 date to 0

SELECT DATEADD(day, 1, 0)
--Result: 1900-01-02

Notice the result is the same. We can also add any other datepart. Want the start of 2009?

SELECT DATEADD(year, 109, 0)
--Result: 2009-01-01

DATEDIFF

DATEDIFF acts in an opposite capacity of DATEADD: it calculates the date or time difference between DATETIME values. The prototype looks like this:

DATEDIFF (datepart, startdate , enddate)

And an actual example:

SELECT DATEDIFF(year, '1900-01-01', '2009-01-01')
--Result: 109

As before, we can certainly specify any of the dates as a INT. Since we know that 1900 has a corresponding integer value of 0, let’s try that:

SELECT DATEDIFF(year, 0, '2009-01-01')
--Result: 109

We can take it a step further and compare two INT values (again, as long as they fall within the valid DATETIME range). Let’s find how many days are between 1900-01-01 and 9999-12-31:

SELECT DATEDIFF(year, 0, 2958463)
--Result: 8099

The important aspect to note is that DATEDIFF “returns the number of date and time boundaries crossed between two specified dates.” (MSDN) In plain language, that means SQL Server never rounds the time period, but rather calculates the time periods completed. For example, let’s check the number of years between Leap Day in 2000 and Christmas 2008:

SELECT DATEDIFF(year, '2000-02-29', '2008-12-25')
--Result: 8

Although the total time is nearly 8.9 years, only 8 actual year boundaries have been crossed. This principle is the basis for calculating beginnings of time periods. When we combine DATEDIFF with DATEADD, the combination will predictably give us the start of any time period. Let’s say we wanted to calculate the number of years since 1900-01-01:

To analyze what is actually occurring, let’s start with the DATEDIFF in the middle <DATEDIFF(year, 0, GETDATE()>. This calculates the number of year boundaries crossed between 1900 and 2009, which is 109. The outer portion <DATEADD(year, 109, 0)> adds that result of 109 to 1900 and gives us the start of the year 2009. This technique allows us to find the start of any date or time period based on a relative starting point. That starting point is commonly GETDATE(), but can be any valid DATETIME. If you want the beginning of the month, swap “year” for “month”. The same can be used for day, week, or any other datepart. This can help us greatly when querying a DATETIME field to fall in a range of values (i.e. “Select all rows created this year.”) What if we wanted the start of next year (2010)?

Notice the only change was a “+ 1″ next to the DATEDIFF calculation. Because we’re calculating the number of years since 1900 (109), we can adjust for years forward by adding, or years back by subtracting. By adding only 1, we jump forward a single year.

Query efficiency

Using DATEADD and DATEDIFF is much faster than string manipulation with CAST and CONVERT. Additionally, it allows us to check the bounds of a DATETIME column without using the YEAR, MONTH, or DAY function. Why is this good? Any modification applied to a column in the WHERE condition, negates the ability for SQL Server to utilize an index on that column. This is because the index contains only the actual values, not some calculation of them. Think of it this way: If you looked in the index of a book and wanted to know the pages of topics whose third letter is an “e” is, you would have to scan the entire article list, because the index is based on the first letter, not third. This is similar to how a YEAR function might affect an index on a DATETIME column. To get around this, rather than altering the column value in the condition, we alter our lookup condition. For example, instead of YEAR(DateColumn) = YEAR(GETDATE()), we can specify DateColumn >= @BeginningOfThisYear AND < @BeginningOfNextYear. Or, more specifically:

In English, it reads something like this: “Select everything from TableName where DateColumn is at least the start of the year and also before next year”. Or, even simpler, “Show me where DateColumn is the current year”. This has a similar effect of YEAR(DateColumn), but will allow an index on DateColumn to be used. Obviously, this will not guarantee the index will be used – if one exists at all – but, it at least leaves the option available to the query optimizer.

Conclusion

It is very common to check for records falling in a specified range such as current year or last month, but many people resort to nasty functions forcing a table scan. With the above knowledge and tools, developers can more efficiently leverage the DATETIME data type and speed up slow-running queries.

I love the datetime datatype, but alas we are banned from using it at work, we are forced to use the unix timestamp instead… Which I guess is faster… somewhat, but less human readable and needs to be converted almost every time it is used…