Description

An interval is a set of values between a certain lower value and a certain upper value, where upper >= lower. In reality there are different kinds of intervals that you may need to represent in your database like temporal ones (sessions, appointments, periods of validity), and others.
When in need to represent temporal intervals in SQL Server most use two attributes holding the lower and upper points in time. Then there are operations that you need to apply between intervals. One of the most common operations is to check which intervals overlap with an input interval. For example: “return all contracts that were active during an input period represented by the inputs @lower and @upper.”
Some of the classic querying methods that are used to provide answers to such requests suffer from fundamental optimization problems. The result is that typical interval queries tend to be very slow and inefficient.
An efficient solution that can be implemented today in SQL Server does exist. It is based on academic work that resulted in a model called Relational Interval tree (RI-tree) by Kriegel, Pötke and Seidl, plus further optimizations called static RI-tree by Laurent Martin. However, I believe that the model and its optimizations are a bit complex for people to comprehend.
The resources below describe the existing optimization problems, the solution that can be applied today and its complexities, and the potential for improvements in SQL Server.
Essentially the suggested improvement in SQL Server is to add internal engine support for the RI-tree model. Such support would include a new type of index (call it interval index) and related optimizer improvements. The user would only be required to use basic syntax for creating the interval index. Behind the scenes SQL Server would create indexing based on the RI-tree model. Also, the optimizer would detect classic interval query patterns based on the predicates used in the query filters. And when detected, the optimizer would use the internal indexes to process the queries efficiently. One of the main benefits in this approach is that existing queries don’t need to be changed. They would just run faster.
Note that this proposal is independent of adding declarative SQL support for interval related queries, which would be a good thing as well. The engine part could still be based on the RI-tree model.
Resources:
• Deck explaining problem and suggested solution: http://tsql.solidq.com/books/source_code/Intervals.pdf
• Source code accompanying deck: http://tsql.solidq.com/books/source_code/Intervals.txt
• Paper describing Relational Interval tree (RI-tree) model – [RI] “Managing Intervals Efficiently in Object-Relational Databases” (Kriegel, Pötke and Seidl of University of Munich): http://www.dbs.ifi.lmu.de/Publikationen/Papers/VLDB2000.pdf
• Articles describing optimizations to original model (static RI-tree) – [SRI1] “A Static Relational Interval Tree” (Laurent Martin): http://www.solidq.com/sqj/Pages/2011-September-Issue/A-Static-Relational-Interval-Tree.aspx and [SRI2] “Advanced interval queries with the Static Relational Interval Tree” (Laurent Martin): https://www.solidq.com/sqj/Pages/Relational/Advanced-interval-queries-with-the-Static-Relational-Interval-Tree.aspx

Assign To

It seems to me that this old feedback is even more important now that SQL 2016 will include temporal support. Temporal queries are *exactly* the kind of queries that need efficient interval intersection, yet MS has no good solution for that. I suppose querying tables that change a lot (i.e. lots of history rows for any given id) are not a target scenario for temporal tables.

Posted by Microsoft on 11/25/2013 at 8:56 PM

Thanks for submitting this feedback. We will evaluate this feature request and get back to you.

Posted by Maurice Pelchat on 7/3/2013 at 12:54 PM

This is a much needed optimization. How often we simply need to find rows that falls between two different dates which are two different attributes of the same table (e.g. start_Date and End_date). Traditional index are almost useless to solve this problem, because SQL needs to read all rows for which start_date <= searched date and all rows for which End_date >= searched date. At best two index needed to be scanned. When a table is made of millions of rows this make expensive queries.

The workarounds I am mentioning here do not mean that I would not want to have the temporal data integration available in SQL Server. Actually, I think this would be one of the most important features Microsoft should add in one of the next versions of SQL Server. I added these workarounds only to mitigate the problems database developers and administrators might currently have with temporal data in SQL Server.

1. Traditional solution with modified WHERE clause - http://blogs.solidq.com/dsarka/Post.aspx?ID=149&title=Interval+Queries+in+SQL+Server+Part+12. Using the Geometry data type - http://blogs.solidq.com/dsarka/Post.aspx?ID=150&title=Interval+Queries+in+SQL+Server+Part+23. Using the unpacked form of the intervals - http://blogs.solidq.com/dsarka/Post.aspx?ID=151&title=Interval+Queries+in+SQL+Server+Part+34. Using a CLR data type with modified WHERE clause and supporting other solutions - http://blogs.solidq.com/dsarka/Post.aspx?ID=152&title=Interval+Queries+in+SQL+Server+Part+4