Correctly Calculating Datetime Differences

T -SQL programmers share a basic and surprisingly challenging problem: correctly calculating the difference between two datetime values. SQL Server doesn't provide a built-in solution for this task, so you have to create your own.

Although SQL Server's DATEDIFF() function lets you calculate the difference between two datetime values in terms of a specified unit (e.g., year, month, day), DATE DIFF() considers only the specified unit and those higher in the temporal hierarchy— not lower units. For example, if you specify a unit of day, SQL Server will consider only the year, month, and day and ignore lower units (hour, minute, second, millisecond). Notice the following query, which asks for the difference in years between two values that are just 3ms apart:

SQL Server returns an incorrect value of 1 (instead of 0) for the number of years between these values because the function doesn't check units lower than year but rather simply subtracts 2005 from 2006.

Let's look at how to get the correct datetime difference, taking all units into consideration. In other words, given two timestamps—from_ts and to_ts—let's see how to return the difference in terms of years, months, days, hours, minutes, seconds, and milliseconds.

Test Your Solution First

Set up the datetime-differences problem by running the code in Listing 1 to create the TimeStamps table and populate it with sample data. Table 1 shows the Time-Stamps table's content. Each row contains an integer key (keycol) and a pair of timestamps (from_ts and to_ts). The task is to correctly calculate the differences between the timestamp pairs, considering all possible datetime units, and produce the desired result that Table 2 shows.

Before looking at my solution, take some time to try to solve the problem yourself. Your solution should handle cases in which from_ts is later than to_ts, in which case your code should generate a negative result. The sgn result column should identify whether the result is positive (1) or negative (-1). (Note that the outputs I show omit the from_ts and to_ts values to save space; you can correlate these values back to the TimeStamps table based on keycol.)

Step-by-Step Calculations

Now, let's walk through my solution, which Listing 2 shows. First, consider the innermost query at callout F in Listing 2, which generates the derived table D1, shown in Table 3. This query uses simple CASE expressions to achieve its purpose: to place the larger value of the timestamp pair (from_ts, to_ts) in the result column to_ts and to place the smaller value in from_ts. This operation allows for simpler calculations later, guaranteeing that from_ts is smaller than or equal to to_ts. The sgn result column will hold a value of 1 if from_ts is less than or equal to to_ts and -1 if from_ts is greater than to_ts.

Callout E shows the query that accepts D1 as its input, then generates derived table D2, which Table 4 shows. This query uses the DATEDIFF() function to calculate the date-unit differences (i.e., year, month, day). Remember that the unit difference that DATEDIFF() calculates might have an offset of 1 from the correct value because the function doesn't consider units lower than the unit specified in the temporal hierarchy; the next step will take care of that offset.

The query at callout D accepts D2 as its input and generates derived table D3, which Table 5 shows. This query uses a CASE expression for each date unit, adding the corresponding unit difference you received from the previous query to the from_ts value. If the result is greater than to_ts, the difference that DATEDIFF() calculated was greater than the correct value by 1, so the code subtracts 1 from the difference. This logic, which fixes the inaccuracy in DATEDIFF()'s calculation, is a key element of the solution.

Why not use the same logic to calculate the difference for the time units in the timestamp values? The reason is that when you get down to the time units, the difference between two timestamps might be higher than an integer can hold. In such cases, the calculation would overflow, so you need to handle the time units separately.

Calculating Time Elements

Now that you've calculated the correct differences for the date units, you need to return from each unit only the portion that the higher-level unit doesn't cover. For example, say you're given the timestamps 20030321 14:27:12.233 and 20060115 11:45:22.263. The calculations in the query at callout D in Listing 2 will yield 2 years, 33 months, and 1030 days, each correct independently. But to show the result in combined units, you need to return only the portion of months after subtracting the higher-level unit, which in this case is 9 months (33 months ? 2 years). Similarly, you want to return only the portion of days after subtracting the years and months, which is 24 days in our example (1030 days ? (2 years and 9 months)). Eventually, you'll return a difference of 2 years, 9 months, and 24 days (plus lower units). The next steps return the date elements and prepare the inputs you need to calculate the time elements.

The query at callout C accepts D3 as its input and generates the derived table D4, which Table 6 shows. This query simply shifts from_ts forward by each of the three date-unit differences (y, m, and d), generating the values y_ts, m_ts and d_ts, respectively. The solution will use each of these timestamps as an anchor to return only the relevant portion of a date unit.

The query at callout B accepts D4 as its input and generates the derived table D5, which Table 7 shows. This query returns only the relevant portion of each date unit, subtracting the difference between to_ts and the higher-level anchor from the independent date unit. The query also calculates the difference in seconds (s) between the day anchor and to_ts. The listing will use s in the next step to calculate all time elements except the milliseconds element.

The next query, at callout A, accepts D5 as its input and generates the final desired result, which Table 2 shows. The query uses simple integer division (/) and modulo (%) to calculate the hour (h), minute (mi), and second (s) units based on the original seconds difference value you received from derived table D5. The query also calculates the millisecond difference (ms) by subtracting the milliseconds unit of from_ts from that of 1000 plus the milliseconds unit of to_ts modulo 1000.The reason for adding 1000 to the ms unit of to_ts, then calculating the modulo by 1000, is to accommodate a case in which the ms unit of to_ts is smaller than the ms unit of from_ts.

Encapsulating Logic in a Function

You now have a way to correctly calculate differences between two datetime values. And you can encapsulate this logic in a function that accepts two datetime values as inputs. You just need to determine how to return the output. You can create a table-valued function and return the different datetime elements and the sign of the result in different columns of a result row. Or, you can create a scalar function that concatenates all output elements in a single character string. Listing 3 shows an example of a scalar function that calculates datetime differences.

The only change you need to make to the original solution is to add a layer that formats the output as a scalar value, as callout A in Listing 3 shows.

You'll receive +0002-09-24 21:18:10.030 as the output, meaning that the difference between the two inputs is positive, 2 years, 9 months, 24 days, 21 hours, 18 minutes, 10 seconds, and 30ms. To verify that the result is correct,use DATEADD() to add all elements to the @from_ts input, and note that you get the following @to_ts input:

As with many problems I cover in this column, solving the datetime-differences challenge involves a lot of logic. To stretch your logic muscles, remember to check out this month's Logical Puzzle, page XX. And don't miss the new section of this column, "Catch That Bug!"

Itzik Ben-Gan (itzik@solidqualitylearning.com), a mentor at Solid Quality Learning, teaches, lectures, and consults internationally. He manages the Israeli SQL Server Users Group, is a SQL Server MVP, and is a coauthor of Advanced Transact-SQL for SQL Server 2000 (Apress).