SQLServerCentral.com / Article Discussions / Article Discussions by Author / Discuss content posted by Steve Gray / Sins of SQL: The Time Table / Latest PostsInstantForum.NET v2.9.0SQLServerCentral.comhttp://www.sqlservercentral.com/Forums/notifications@sqlservercentral.comTue, 31 Mar 2015 16:28:01 GMT20RE: Sins of SQL: The Time Tablehttp://www.sqlservercentral.com/Forums/Topic930511-2705-1.aspxHi Steve,Great article on the fn_gettimerangestamp. I've been using a date table for some time at a few people's suggestions for some a variety of issues at work. I'm not in IT, just a manager with limited resources and a bit of a SQL aficionado. Anyway, I'm in the process of setting up a reporting database/cube and I'm definitely going to try to make use of your function. I work in transit and everything is measured in seconds (ie revenue hours, deadhead hours, layover hours etc all need to be calculated to at least the minute, but are stored in seconds in the source database). So this will be a godsend if it helps improve performance.I was wondering if you've ever had any ideas on setting up time tables (not in the SQL sense, but schedules). I've been working on pivoting times on routes using SQL Pivot, but it's a bit cumbersome. Some of the variables that I encounter are that I have fixed routes with fixed patterns. However, routes will sometimes start service mid trip, so the earliest time is not actually at the normal starting point. The other issue is that there are a lot of repeat stops when a route does a loop.I can provide more background if you've worked on anything like this, just curious.Thanks for the great info!CraigTue, 12 Jul 2011 00:06:52 GMTflamRE: Sins of SQL: The Time Tablehttp://www.sqlservercentral.com/Forums/Topic930511-2705-1.aspxThe other point not mentioned, is that, for DimDate and DimTime, the real reason over and above the performance of joining via an int (and typically not going via the DimDate either) is the additional information required.Almost all places have a financial calendar that is NOT the normal calendar, and having both of these in the table is essential.Same for time, shifts are NOT (legislatively in most countries) 24 hours long, and also don't match up to the 24 hour period anywayTue, 08 Jun 2010 19:28:50 GMTMark StaceyRE: Sins of SQL: The Time Tablehttp://www.sqlservercentral.com/Forums/Topic930511-2705-1.aspx[quote][b]Lynn Pettis (6/1/2010)[/b][hr]Its lack of scalability is its biggest problem. A recursive CTE is still RBAR (a Modenism for Row By Agonizing Row).[/quote]Here's a little comparison between "quirky update" and rCTE which supports most folks' observation that the rCTE is [i]slower[/i]. However, I'd disagree with you Lynn that the rCTE fails to scale - the "quirky update" is 5-7 times faster whether the dataset is 100k rows or 1 million rows. Sure it's slower - but it [i]does[/i] scale.Here's the test with 1 million rows:[code="sql"]DROP TABLE #NumbersSELECT TOP 1000000 --000 n = ROW_NUMBER() OVER (ORDER BY a.name), CalcValue = CAST(NULL AS BIGINT)INTO #NumbersFROM master.dbo.syscolumns a, master.dbo.syscolumns bCREATE UNIQUE CLUSTERED INDEX CIn ON #Numbers ([n] ASC) SET STATISTICS IO ONSET STATISTICS TIME ON-- 'Quirky' updateDECLARE @Lastval INT = 0, @CalcValue BIGINTUPDATE #Numbers SET @CalcValue = CalcValue = (@Lastval + n), @Lastval = n-- (1,000,000 row(s) affected) / CPU time = 4218 ms, elapsed time = 5719 ms.-- Table #Numbers... Scan count 1, logical reads 3113, physical reads 6, read-ahead reads 3146, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.-- Recursive CTE;WITH Calculator (n, CalcValue) AS ( SELECT n.n, CalcValue = CAST(n.n AS BIGINT) FROM #Numbers n WHERE n.n = 1 UNION ALL SELECT n.n, CalcValue = n.n + c.n FROM #Numbers n INNER JOIN Calculator c ON c.n + 1 = n.n -- nice )SELECT n, CalcValue FROM CalculatorOPTION (MAXRECURSION 0)-- (1,000,000 row(s) affected) / CPU time = 32438 ms, elapsed time = 35148 ms.-- Table 'Worktable'. Scan count 2, logical reads 6000001, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.SET STATISTICS IO OffSET STATISTICS TIME Off[/code]And with 100k rows:[code="sql"]DROP TABLE #NumbersSELECT TOP 100000 n = ROW_NUMBER() OVER (ORDER BY a.name), CalcValue = CAST(NULL AS BIGINT)INTO #NumbersFROM master.dbo.syscolumns a, master.dbo.syscolumns bCREATE UNIQUE CLUSTERED INDEX CIn ON #Numbers ([n] ASC) SET STATISTICS IO ONSET STATISTICS TIME ON-- 'Quirky' updateDECLARE @Lastval INT = 0, @CalcValue BIGINTUPDATE #Numbers SET @CalcValue = CalcValue = (@Lastval + n), @Lastval = n-- (100000 row(s) affected) / CPU time = 454 ms, elapsed time = 526 ms.-- Table #Numbers... Scan count 1, logical reads 314, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.-- Recursive CTE;WITH Calculator (n, CalcValue) AS ( SELECT n.n, CalcValue = CAST(n.n AS BIGINT) FROM #Numbers n WHERE n.n = 1 UNION ALL SELECT n.n, CalcValue = n.n + c.n FROM #Numbers n INNER JOIN Calculator c ON c.n + 1 = n.n -- nice )SELECT n, CalcValue FROM CalculatorOPTION (MAXRECURSION 0)-- (100000 row(s) affected) / CPU time = 3203 ms, elapsed time = 3483 ms.-- Table 'Worktable'. Scan count 2, logical reads 600001, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.SET STATISTICS IO OffSET STATISTICS TIME Off[/code]Whichever measure of "resource used" you choose for comparison, both methods appear to scale in a linear manner.If you've absolutely got to do a job row by row, such as a running totals, then a quirky update is virtually guaranteed to complete before a rCTE. That's not the whole picture though. The rCTE gives you output which you may have to write back: the quirky update does the opposite. The rCTE is often quicker to write with a little practice too. I use both methods, sometimes even writing both for a particular job then choosing one or the other on merit - which might be readability over performance.CheersChrisMMon, 07 Jun 2010 08:31:16 GMTChrisM@WorkRE: Sins of SQL: The Time Tablehttp://www.sqlservercentral.com/Forums/Topic930511-2705-1.aspx[quote][b]Lynn Pettis (6/6/2010)[/b][hr]Steve,You really are missing the entire point. I am not disputing what the function does but HOW it does it. Your use of a recursive CTE is inefficient REGARDLESS of the application of the overall code in this regard. What, may I ask, is so complex about my code in comparision to yours? Is it how I have formatted the code?[/quote]I'm not missing the point. The benefit you will extract from said code improvements over your lifetime has likely been exceeded by the energy you've exhausted debating/examining the matter. If this were code I were recommending for a high volume transactional OLTP application, I'd likely be more enthused about a performance debate, and put as much energy into that aspect as I have with the other areas of discussion. It's not, so I won't.I look forward to our next discussion.Sun, 06 Jun 2010 23:42:07 GMTSteven James GrayRE: Sins of SQL: The Time Tablehttp://www.sqlservercentral.com/Forums/Topic930511-2705-1.aspx[quote][b]Steven James Gray (6/6/2010)[/b][hr][quote][b]Lynn Pettis (6/6/2010)[/b][hr]Steve,Unfortunately, you are using the same logic everyone else uses to defend the use of inefficient code, "It works for my case." Problem is, it is still inefficient. Recursion is by nature RBAR, Row By Agonizing Row. My routine eliminates both the recursion and the IO generated by using worktable.[/quote]When considering optimisation, I consider how often the code is called, the trade-off of complexity for performance other factors.[quote]Also, you seem to be indicating that using OPTION (MAXRECURSION 60) makes the call to your routine more efficient than OPTION (MAXRECURSION 0), which it doesn't. I ran multiple tests using both and they were comparable, in fact the OPTION (MAXRECURSION) was actually faster in many cases (although I only ran about 10 runs of each).[/quote]I never said that it made it faster, but used it to demonstrate that the recursion is nowhere near as deep as yourself and others implied. Producing a 40,000 row result set with only sixty iterative steps in the example.[quote]Some one is going to look at your code and say, "Hey, that is exactly what I need for what I am working on!", when in actuality it isn't because what they may be doing is against several hundred thousand rows or more with a high frequency of usage.[/quote]This code can produce a centuries worth of date-time values in half hour, as demonstrated by your own tests. Unless they're constantly re-computing centuries worth of data (which would lend itself to a persistent table, rather than a dynamic rowset), they're not going to get much win for the effort.[/quote]Steve,You really are missing the entire point. I am not disputing what the function does but HOW it does it. Your use of a recursive CTE is inefficient REGARDLESS of the application of the overall code in this regard. What, may I ask, is so complex about my code in comparision to yours? Is it how I have formatted the code?Sun, 06 Jun 2010 16:56:09 GMTLynn PettisRE: Sins of SQL: The Time Tablehttp://www.sqlservercentral.com/Forums/Topic930511-2705-1.aspx[quote][b]Lynn Pettis (6/6/2010)[/b][hr]Steve,Unfortunately, you are using the same logic everyone else uses to defend the use of inefficient code, "It works for my case." Problem is, it is still inefficient. Recursion is by nature RBAR, Row By Agonizing Row. My routine eliminates both the recursion and the IO generated by using worktable.[/quote]When considering optimisation, I consider how often the code is called, the trade-off of complexity for performance other factors.[quote]Also, you seem to be indicating that using OPTION (MAXRECURSION 60) makes the call to your routine more efficient than OPTION (MAXRECURSION 0), which it doesn't. I ran multiple tests using both and they were comparable, in fact the OPTION (MAXRECURSION) was actually faster in many cases (although I only ran about 10 runs of each).[/quote]I never said that it made it faster, but used it to demonstrate that the recursion is nowhere near as deep as yourself and others implied. Producing a 40,000 row result set with only sixty iterative steps in the example.[quote]Some one is going to look at your code and say, "Hey, that is exactly what I need for what I am working on!", when in actuality it isn't because what they may be doing is against several hundred thousand rows or more with a high frequency of usage.[/quote]This code can produce a centuries worth of date-time values in half hour, as demonstrated by your own tests. Unless they're constantly re-computing centuries worth of data (which would lend itself to a persistent table, rather than a dynamic rowset), they're not going to get much win for the effort.Sun, 06 Jun 2010 16:38:48 GMTSteven James GrayRE: Sins of SQL: The Time Tablehttp://www.sqlservercentral.com/Forums/Topic930511-2705-1.aspxSteve,Unfortunately, you are using the same logic everyone else uses to defend the use of inefficient code, "It works for my case." Problem is, it is still inefficient. Recursion is by nature RBAR, Row By Agonizing Row. My routine eliminates both the recursion and the IO generated by using worktable.Also, you seem to be indicating that using OPTION (MAXRECURSION 60) makes the call to your routine more efficient than OPTION (MAXRECURSION 0), which it doesn't. I ran multiple tests using both and they were comparable, in fact the OPTION (MAXRECURSION) was actually faster in many cases (although I only ran about 10 runs of each).Some one is going to look at your code and say, "Hey, that is exactly what I need for what I am working on!", when in actuality it isn't because what they may be doing is against several hundred thousand rows or more with a high frequency of usage.Sun, 06 Jun 2010 15:42:14 GMTLynn PettisRE: Sins of SQL: The Time Tablehttp://www.sqlservercentral.com/Forums/Topic930511-2705-1.aspx[quote][b]Lynn Pettis (6/6/2010)[/b][hr]Actually inefficiency is not relative, it may just not be noticable. Inefficient is still inefficient if there is another means to generate the same results that is better.[/quote]I'd never said this was the most efficient method - but even your examples for processing 100 years of data show gains that would only ever be realized in an extreme scenario, and that's assuming you loaded it all at once. A few minutes per-century translates to a sub-second difference per day, so you'll understand why I'm not racing to rewrite my ETL packages. I'm not trying to knock the achievement, you should be pleased with your optimisations, but they're not ones I've ever had to seek. [quote]Care to prove your statement? [/quote]I was demonstrating that the recursion is not at the lower levels, merely the day to day. As you know the parameter MAXRECUSION prevents any CTE from recursing more than the specified limit. By setting a MAXRECURSION of 60 I was demonstrating that rather than (as has been implied), this being massively recursive, the actual recursion is split into two smaller recursive elements (the generation of the NumbersCTE, with values 0-60) and the iteration over the dates. The only recursion after the first 'day' is processed is the DATEADD region.If you review the query plan for the statement post-execution, you'll see the branch of the plan that generates the table-spool for the HH:MM:SS divisions is executed once ('Number of Executions') but the spool is read many times by a Nested Loops operator. As whilst it's not as fast, it's nowhere near as inefficient as some seemed to imply. At least you've had the good sport to put it into SSMS and try it.[quote]I'm pretty sure that my routine (with a slight modification thanks to you allowing me to see an ineffeciency in my code, which I have fixed in my blog) will consistantly beat your code in head to head competition.[/quote]As per above. Don't dispute yours will be faster, but your method only gives a benefit of ~20 minutes of time saved for each century of data. You'd never load that much data at once in practice, and because the degredation is non-linear it means the benefit is lower as you scale down the range (I tend to populate incrementally each day), and the difference I've found when feeding both your example and my own into SSIS for a daily population is that both take more or less a second to run. Sun, 06 Jun 2010 14:15:06 GMTSteve Gray-464702RE: Sins of SQL: The Time Tablehttp://www.sqlservercentral.com/Forums/Topic930511-2705-1.aspxSpecial note, I found no real appreciable difference between the following:[code="sql"]select * into #Test2 from dbo.fn_GetTimestampRangeRCTE ('2006-01-01','2006-02-01', 0) OPTION (MAXRECURSION 0);select * into #Test2 from dbo.fn_GetTimestampRangeRCTE ('2006-01-01','2006-02-01', 0) OPTION (MAXRECURSION 60);[/code]Sun, 06 Jun 2010 13:04:37 GMTLynn PettisRE: Sins of SQL: The Time Tablehttp://www.sqlservercentral.com/Forums/Topic930511-2705-1.aspx[quote][b]Steven James Gray (6/6/2010)[/b][hr]Inefficiency is relative. As slow as it is, it's possible to generate enough data to cover a century in less than half an hour when dealing at the minute resolution, and without any effort applied to optimise what's there. Relative to the time and effort to maintain and keep on top of populating a time table manually, it seems like small-fry. You'd not reprocess the whole dimension every time you loaded a fact in to the tables either but rather only incrementally populate ranges as you go.[/quote]Actually inefficiency is not relative, it may just not be noticable. Inefficient is still inefficient if there is another means to generate the same results that is better.[quote]If you look at the code, the recursive steps in the example given are broken into two recursive elements - the date range and the time one, rather than a single second-level recursive process. The query plans in this case take the date/time combinations and do a table spool over them, and then as each day value is generated it is cross-joined to the time combinations. That's a sum total of 36,500 total recursions per century, total! To prove this is the case try running the following query using the TVF from the article:SELECT * FROM dbo.[fn_GetTimestampRange]('1 January 2006','1 February 2006', 0) OPTION(MAXRECURSION 60) This will produce a rowset of 44641 rows, but [b][u]never recurses more than 60 levels as a result of the optimisations obtained by the multiple re-use of the numbers expression within the function[/u][/b]. The only part of it that actually grows in cost is the recursive step-per-day. Your maxrecursion value needs to be at least 60, and at most the number of distinct days in the range covered. As a result, this is substantially more efficient than your article would imply.It's possible to optimise this even further by doing the same trick when resolving all the possible year/month permutations, meaning the recursive cost would shrink further, but I thought that'd impact readability and not gain much for performance.[/quote]Care to prove your statement? I'm pretty sure that my routine (with a slight modification thanks to you allowing me to see an ineffeciency in my code, which I have fixed in my blog) will consistantly beat your code in head to head competition.[quote]As it stands the current design has the advantage being an inlinable TVF (and most performance work would likely prevent that), which means it can be used directly in a DSV, and streams the rows as fast as they're produced, rather than waiting for all rows to be computed in a temp-table, as the multi-step TVF approach would do.[/quote]Have you taken a close look at my code (either version)? It is also written as an inline-TVF and can be used in exactly the same fashion as yours. It has the benefit of being more efficient. After making a small modification to my code, I ran both functions to generate 100 years of data at one minute intervals, writing the output to temporary tables:[code="sql"]set statistics io on;set statistics time on;select * into #Test1 from dbo.fn_GetTimestampRangeCTE ('2000-01-01','2100-01-01', 0)set statistics time off;set statistics io off;set statistics io on;set statistics time on;select * into #Test2 from dbo.fn_GetTimestampRangeRCTE ('2000-01-01','2100-01-01', 0) OPTION (MAXRECURSION 0);set statistics time off;set statistics io off;[/code] SQL Server Execution Times: CPU time = 432079 ms, elapsed time = 453693 ms.(52596001 row(s) affected)Table 'Worktable'. Scan count 8, logical reads 2046345, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 1525828 ms, elapsed time = 1553694 ms.(52596001 row(s) affected)Once again, you'll see that your function, using a recursive CTE is quite a bit slower. And even generating 52,596,001 rows of data, mine still isn't generating additional IO like the recursive CTE does.Sun, 06 Jun 2010 12:55:20 GMTLynn PettisRE: Sins of SQL: The Time Tablehttp://www.sqlservercentral.com/Forums/Topic930511-2705-1.aspx[quote][b]Martin Mason (6/4/2010)[/b][hr]Hey. Why stop there! If separate date/time dimensions leads to the same number of possible cells in the resultant cube space as the a single date/time dimension, that same logic should apply to everything. If sound logic, then that principle should be true. Then why separate customer/product/store dimensions? An equally sound dimensional design would be to have a single fact table related to one dimension with a composite key of product/customer/store and the hierarchies built from that key.[/quote]I never said it was a good idea to not have dimensions - but I think the argument for/against seperate/unified date time dimensions depends on your view of time. When I model things I tend to view date/time as a singular axis. For reporting that needs to look at time valuations, irrespective of their dates (not something that tends to be valuable in the work I do), the hierarchies support in SSAS allows those needs to be met. So whilst I'd still maintain seperate customer/product/store dimensions, I wouldn't ever opt for a seperate year/month dimensions.[quote]I hope we both agree that that design is NOT sound. Why that design is not sound is the same reasoning that the composite date/time dimension is not sound (Again, read chapter 2 of [i]The Data Warehouse Toolkit[/i] for lengthy justification why it's not.) Because a Time member, i.e. 12:01 AM is not uniquely related to only one member of the date dimension. It's related to every member of the date dimension and the same justification to separate Product and Customer and Store is the same logic applied to separate date and time. A date is a date and a time is a time and they are not the same. [/quote]The example I gave was purely to demonstrate the concept that the overal cardinality of the dimensional space is the same, regardless of which horse you ride when it comes to date/time tables, and merging store/product/customer dimensions would never be a real scenario someone would likely do. That particular book sits on my shelf, and whilst it has it's views on things, no one book is cannon. Time represents subdivisions of a date, and if it was genuinely so valuable to seperate date/time, why would you not maintain seperate year, month, hour, minute, second dimensions for the same reasons. There's always 60 seconds per minute, 60 minutes per hour, 12 months per year, etc - but you'd be hard pressed to find a solution out there that does things that way.[quote][b]Lynn Pettis (6/4/2010)[/b][hr]Here is the problem, you are using an inefficient method of generating the data. Another developer is going to come around, see this code and say "Great, just what I need for what I am working on!" Problem, turns out this developer is trying to use it on multi-million row data sets and can't figure out why it doesn't work well.[/quote]Inefficiency is relative. As slow as it is, it's possible to generate enough data to cover a century in less than half an hour when dealing at the minute resolution, and without any effort applied to optimise what's there. Relative to the time and effort to maintain and keep on top of populating a time table manually, it seems like small-fry. You'd not reprocess the whole dimension every time you loaded a fact in to the tables either but rather only incrementally populate ranges as you go.[quote]Also, what happens if you start using this same code on much larger datasets and start seeing performace issues?....Some very simple testing showed the inefficiences in the recursive CTE, including additional cpu time and IO requirements of the method.[/quote]If you look at the code, the recursive steps in the example given are broken into two recursive elements - the date range and the time one, rather than a single second-level recursive process. The query plans in this case take the date/time combinations and do a table spool over them, and then as each day value is generated it is cross-joined to the time combinations. That's a sum total of 36,500 total recursions per century, total! To prove this is the case try running the following query using the TVF from the article:SELECT * FROM dbo.[fn_GetTimestampRange]('1 January 2006','1 February 2006', 0) OPTION(MAXRECURSION 60) This will produce a rowset of 44641 rows, but [b][u]never recurses more than 60 levels as a result of the optimisations obtained by the multiple re-use of the numbers expression within the function[/u][/b]. The only part of it that actually grows in cost is the recursive step-per-day. Your maxrecursion value needs to be at least 60, and at most the number of distinct days in the range covered. As a result, this is substantially more efficient than your article would imply.It's possible to optimise this even further by doing the same trick when resolving all the possible year/month permutations, meaning the recursive cost would shrink further, but I thought that'd impact readability and not gain much for performance. As it stands the current design has the advantage being an inlinable TVF (and most performance work would likely prevent that), which means it can be used directly in a DSV, and streams the rows as fast as they're produced, rather than waiting for all rows to be computed in a temp-table, as the multi-step TVF approach would do.Sun, 06 Jun 2010 10:36:42 GMTSteven James GrayRE: Sins of SQL: The Time Tablehttp://www.sqlservercentral.com/Forums/Topic930511-2705-1.aspx[quote][b]Steven James Gray (6/4/2010)[/b][hr] ...I've yet to see any demonstration that the CTE itself is unsuitable for synthesising the data on-demand - it's not the fastest mechanism (and I've never claimed it would be), but it's also not an order of magnitute worse. Like procedural texture generation in 3D modelling systems, it trades some performance for the ability to switch to arbitrary levels of granularity by modifications to a single area....[/quote]Here is the problem, you are using an inefficient method of generating the data. Another developer is going to come around, see this code and say "Great, just what I need for what I am working on!" Problem, turns out this developer is trying to use it on multi-million row data sets and can't figure out why it doesn't work well.Also, what happens if you start using this same code on much larger datasets and start seeing performace issues?Some very simple testing showed the inefficiences in the recursive CTE, including additional cpu time and IO requirements of the method.Fri, 04 Jun 2010 10:15:29 GMTLynn PettisRE: Sins of SQL: The Time Tablehttp://www.sqlservercentral.com/Forums/Topic930511-2705-1.aspx[quote]If you have seperated date/time dimensions, the number of possible cells in the multidimensional space that need storing individually will be the same, since the conceptual space is the cross product of the dimensions. To give another example: Say I have a table of products with 8 things in it, a table of 100 customers, and 5 stores. A cube storing the sums may keep 4,000 distinct entries. If two of the dimensions are merged to form a hierarchy, since perhaps, each customer has a relationship 1:1 with a store, then you'd end up with a Customer (by store) dimension of cardinality 500. This gives, likewise, a dimensional space with up to 5000 (correction - 4000) cells.[/quote]Hey. Why stop there! If separate date/time dimensions leads to the same number of possible cells in the resultant cube space as the a single date/time dimension, that same logic should apply to everything. If sound logic, then that principle should be true. Then why separate customer/product/store dimensions? An equally sound dimensional design would be to have a single fact table related to one dimension with a composite key of product/customer/store and the hierarchies built from that key.I hope we both agree that that design is NOT sound. Why that design is not sound is the same reasoning that the composite date/time dimension is not sound (Again, read chapter 2 of [i]The Data Warehouse Toolkit[/i] for lengthy justification why it's not.) Because a Time member, i.e. 12:01 AM is not uniquely related to only one member of the date dimension. It's related to every member of the date dimension and the same justification to separate Product and Customer and Store is the same logic applied to separate date and time. A date is a date and a time is a time and they are not the same. I'm done. No mas.Fri, 04 Jun 2010 09:35:44 GMTMartin MasonRE: Sins of SQL: The Time Tablehttp://www.sqlservercentral.com/Forums/Topic930511-2705-1.aspx[quote]But to combine date and time just increases the sparcity of the multidimensional database along a dimension that is most likely going to be used to partition a database of any size whatsoever.[/quote]The only change this would require for partitioning is that the partitions are built off of the date/time values itself, not arbitrary key ranges in the time table.[quote] It would just be a huge mistake to do so. Like said earlier, separate date and time dimensions for 10 years worth of data at the minute resolution results in a dimension with over 5 million (where'd I get 13M from?) stored members while separate dimensions results in a date dimension with 3650 members and a time dimension with 1440 members. I don't need benchmarks to tell me the separate dimension approach will best the consolidated dimension approach in every aspect; dimension processing, aggregation processing, ease of maintenance. Everything.[/quote]If you have seperated date/time dimensions, the number of possible cells in the multidimensional space that need storing individually will be the same, since the conceptual space is the cross product of the dimensions. To give another example: Say I have a table of products with 8 things in it, a table of 100 customers, and 5 stores. A cube storing the sums may keep 4,000 distinct entries. If two of the dimensions are merged to form a hierarchy, since perhaps, each customer has a relationship 1:1 with a store, then you'd end up with a Customer (by store) dimension of cardinality 500. This gives, likewise, a dimensional space with up to 5000 cells.The same principle holds here for date/time. Whether you have a single dimension, or two, the maximum number of values themselves at the lowest level is absolutely the same. When it comes to aggregations etc, you'd only loose out on aggregation design if you needed to group by time values independantly of days - and you can keep things as simple/complex internally by using the appropriate hierarchy definitions that suit your querying needs.The only argument I've heard that seems to be based on anything is the notion that the large number of members overall in the dimension may cause slowness as the members of the hierarchy are resolved - although I'll admit that in my tests with around ~1B fact rows haven't had any demonstrable problems when linking it to a pseudo-time dimension covering a range of ten years. Each day the next few records are incrementally loaded into the date dimension and the days data is pushed in shortly thereafter - runs just as quickly as it used to beforehand. Part of the reason I suspect is the fact that SSAS does not linearlly scan a dimension looking for the appropriate key when loading data, but uses a tree. The effort of searching one tree of combined values, versus two smaller trees is is not materially higher - any basic algorithms class will tell you that (doubling the amount of data in a B+ tree/Binary tree does not double search time, since the performance of lookups is always logarithmic).[quote] But wait. If I convert a datetime type to an int [CAST(FactDate AS int)], excluding the time portion (assumes its AM), gives me the number of days elapsed since 1/1/1900. Hey, why not use that as the surrogate key of your date dimension! (Which is exactly what I and others do) I don't even have to do a look up for a date key at all in my ETL and if I haven't already, I could create a calculated column in my DSV. Similarly, I could use the number of minutes (or seconds) that have elapsed since midnight as the surrogate key of my time dimension so that I don't have to do a lookup for the Time Key either.(And could still create a calculated column for that key in my Dg fast.[/quote]That approach is valid and workable, and one I've used myself where my reporting periods cover all possible date/time combinations. If you look at how I recommended things in the article, you'll notice this precisely the mechanism by which the milliseconds etc are normalized out of the data.[quote] (And not nearly as "evil" as the CTE Steve purposes as the replacement of the time dimension)[/quote]I've yet to see any demonstration that the CTE itself is unsuitable for synthesising the data on-demand - it's not the fastest mechanism (and I've never claimed it would be), but it's also not an order of magnitute worse. Like procedural texture generation in 3D modelling systems, it trades some performance for the ability to switch to arbitrary levels of granularity by modifications to a single area.[quote]So to summarize, I still would give this article zero stars. I think it subtracts, not adds, to the body of knowledge.[/quote]To each their own.Fri, 04 Jun 2010 00:10:20 GMTSteven James GrayRE: Sins of SQL: The Time Tablehttp://www.sqlservercentral.com/Forums/Topic930511-2705-1.aspxTo continue on my previous post, Second, point reasoning. You ideally want your multidimensional structures to be dense as possible. Every combination of every member of every attribute hierarchy to be referenced somewhere in your partition stores. That's not going to happen obviously as for a retail system, not every customer buys every product on every day (and in Steve's Case, every second). But to combine date and time just increases the sparcity of the multidimensional database along a dimension that is most likely going to be used to partition a database of any size whatsoever. It would just be a huge mistake to do so. Like said earlier, separate date and time dimensions for 10 years worth of data at the minute resolution results in a dimension with over 5 million (where'd I get 13M from?) stored members while separate dimensions results in a date dimension with 3650 members and a time dimension with 1440 members. I don't need benchmarks to tell me the separate dimension approach will best the consolidated dimension approach in every aspect; dimension processing, aggregation processing, ease of maintenance. Everything.Thirdly, say you have a referenced date in your fact table but no foreign key to your date and time dimensions. You could use Steve's approach to reference a virtual TIME (and DATE) dimension using recursive CTEs to force into SSAS. But wait. If I convert a datetime type to an int [CAST(FactDate AS int)], excluding the time portion (assumes its AM), gives me the number of days elapsed since 1/1/1900. Hey, why not use that as the surrogate key of your date dimension! (Which is exactly what I and others do) I don't even have to do a look up for a date key at all in my ETL and if I haven't already, I could create a calculated column in my DSV. Similarly, I could use the number of minutes (or seconds) that have elapsed since midnight as the surrogate key of my time dimension so that I don't have to do a lookup for the Time Key either. (And could still create a calculated column for that key in my DSV) And because of star joins and the small size of the date and time dimensions, analysis of events between arbitrary date/time periods is lightning fast. (And not nearly as "evil" as the CTE Steve purposes as the replacement of the time dimension)So to summarize, I still would give this article zero stars. I think it subtracts, not adds, to the body of knowledge.Thu, 03 Jun 2010 20:28:06 GMTMartin MasonRE: Sins of SQL: The Time Tablehttp://www.sqlservercentral.com/Forums/Topic930511-2705-1.aspxI'd like to apologize to the author because I'll confess, I read the title and the first paragraph and was immediately enraged. This approach should never replace the Date and Time dimension tables in a data warehouse as the title of the article suggests. However, it could be used to build the date and time dimension tables (Though, I believe approaches using SSIS to be vastly superior and much, much more flexible.) This pure SQL approach is kinda cool though.Enough of the compliments because I still believe this article is garbage. First, stored date AND time dimensions are critical to a data warehouse projects success. If you don't need to account for holidays, workdays, special other time periods, multiple date/time hierarchies etc now, you will later and without an architecture that's able to adapt to changing business and reporting requirements efficiently and effectively, you'll fail. Word of advice. Every issue about dimensional modeling described in [i]The Data Warehouse Toolkit[/i], follow it. Though it doesn't touch on everything you need to know, the foundation of everything is there. Second, about SSAS on which this article seemed to be focused on, I can't stress enough that the Date and Time dimensions should never, ever, ever, ever be combined in a single dimension in a multidimensional database. More later on why not if someone doesn't beat me to the issue. Gotta go.Thu, 03 Jun 2010 17:53:58 GMTMartin MasonRE: Sins of SQL: The Time Tablehttp://www.sqlservercentral.com/Forums/Topic930511-2705-1.aspx[quote][b]Lynn Pettis (6/3/2010)[/b][hr][quote][b]dan_public (6/3/2010)[/b][hr]Lynn,OK. Now I'm confused too. Dan.[/quote]That's okay, I just went through this thread and I can't find anywhere where I critisized any of your responses. I have a feeling you have confused me with someone else in this thread unless you can find the post where I did this.Lynn[/quote]Lynn,My apologies. I got you confused with YSL Guru. The only thing I can think of is that I'm not used to the format of this forum software and simply looked in the wrong place.Again, my apologies.Dan.Thu, 03 Jun 2010 17:28:49 GMTdan_publicRE: Sins of SQL: The Time Tablehttp://www.sqlservercentral.com/Forums/Topic930511-2705-1.aspx[quote][b]dan_public (6/3/2010)[/b][hr]Lynn,OK. Now I'm confused too. Dan.[/quote]That's okay, I just went through this thread and I can't find anywhere where I critisized any of your responses. I have a feeling you have confused me with someone else in this thread unless you can find the post where I did this.LynnThu, 03 Jun 2010 17:03:30 GMTLynn PettisRE: Sins of SQL: The Time Tablehttp://www.sqlservercentral.com/Forums/Topic930511-2705-1.aspxLynn,OK. Now I'm confused too. Dan.Thu, 03 Jun 2010 13:30:41 GMTdan_publicRE: Sins of SQL: The Time Tablehttp://www.sqlservercentral.com/Forums/Topic930511-2705-1.aspxAll, please ignore the multiple posts above by me. Issues posting from my BlackBerry.Steve, if you could delete the dups, that would be nice.LynnThu, 03 Jun 2010 13:21:44 GMTLynn PettisRE: Sins of SQL: The Time Tablehttp://www.sqlservercentral.com/Forums/Topic930511-2705-1.aspxdeleted -- multiple posted due to blackberry issuesThu, 03 Jun 2010 13:13:21 GMTLynn PettisRE: Sins of SQL: The Time Tablehttp://www.sqlservercentral.com/Forums/Topic930511-2705-1.aspxdeleted -- multiple posted due to blackberry issuesThu, 03 Jun 2010 13:12:52 GMTLynn PettisRE: Sins of SQL: The Time Tablehttp://www.sqlservercentral.com/Forums/Topic930511-2705-1.aspxdeleted -- multiple posted due to blackberry issuesThu, 03 Jun 2010 13:12:35 GMTLynn PettisRE: Sins of SQL: The Time Tablehttp://www.sqlservercentral.com/Forums/Topic930511-2705-1.aspxdeleted -- multiple posted due to blackberry issuesThu, 03 Jun 2010 13:12:21 GMTLynn PettisRE: Sins of SQL: The Time Tablehttp://www.sqlservercentral.com/Forums/Topic930511-2705-1.aspxdeleted -- multiple posts due to blackberry issuesThu, 03 Jun 2010 13:12:09 GMTLynn PettisRE: Sins of SQL: The Time Tablehttp://www.sqlservercentral.com/Forums/Topic930511-2705-1.aspx[quote][b]dan_public (6/3/2010)[/b][hr][quote][b]Lynn Pettis (6/3/2010)[/b][hr][quote][b]dan_public (6/3/2010)[/b][hr]Lynn,Your point about constructive criticism is well taken. I'll keep that in mind and will scrub out the "crisp" language going forward. That said...I'm not sure why you think my initial post (copied below) was sarcastic. In fact, while the final statement was more declarative than the point you made in your article, the message is the same - while using Gray's methodology is something you [u]can[/u] do, it's probably not something that you [u]should[/u] do. Where is the sarcasm?Regards,Dan.[quote]Besides the many issues raised about this approach, another issue not addressed by this approach is auditing. The Time dim (the most important dimension, IMO) is dynamic. It's members are ephemeral and only created on the fly. There's nothing to audit.Auditing is less important when using when using a simple "textbook" example like the one chosen. OTOH, real-world Time dims can be much more robust and will take time to create, audit, and maintain.IMO, this is an example of a nice theoretical discussion of what you CAN do but probably should NOT do.Regards,Dan.One star.[/quote][/quote]Dan,Sorry, but I'm confused. I don't recall any of the above. Could you find the post number and provide that or a link to the post?Thank you,Lynn[/quote]Lynn,Here's the post link: [url=http://www.sqlservercentral.com/Forums/FindPost930695.aspx][/url]Regards,Dan.p.s. The forum software has an odd way of parsing links. You'll have to cut and paste. Sorry.[/quote]Dan,Still confused. I was actually looking for the post where I was supposedly critical of your post. I only remember being critical of the inappropriate use of a recursive CTE in the code provided in the article.Thanks,LynnThu, 03 Jun 2010 12:56:28 GMTLynn PettisRE: Sins of SQL: The Time Tablehttp://www.sqlservercentral.com/Forums/Topic930511-2705-1.aspx[quote]A day is NOT a "time interval". [/quote]That's a matter of perspective, and not a matter of fact. Under what circumstance is a day not subdivision of some constantly distributed time axis?[quote] But, you're obviously so wrapped up in your shiny little solution about TIME to understand that. [/quote]The code in question could be used to build surrogate Date+Time combined dimensions, or broken down/refactored to generate two smaller dimensions that cater for both seperately, without having to re-key the source tables or do arbitrary lookups. [quote] One of my systems deals with DATES, only DATES, and nothing but DATES. It could care less about hours, minutes, and seconds....[/quote]The article clearly indicates that the resolution I've used may not suit all applications, and you may want to tweak it to the scenario.[quote] On this day - 6/3/2010, there are exactly 14 different definitions for 6/3/2010, all at the same time and all used by the business. 6/3/2009 has 11 different definitions and 6/3/2011 currently has 17 different definitions.[/quote]Sounds like different definitions of dates is something you can indicate another dimension (DateType dimension?) - what bearing does this have on the article itself though?[quote] If that weren't enough, in another cube, this precise hour in time (7AM) on this day happens to have 4 different business definitions while this same hour 1 year ago only had 3 business definitions. Could you deal with all of this in code? Sure. But, your code would be so ridiculously convoluted with more exception than actual rule and you'd spend just about all of your time testing and modifying the code to deal with each date that you'd never be done.[/quote]I'd create a table such as:TimeDefinition TimeDefinitionID (Primary key) Description StartDate EndDateThen do inner joins to this where the facts being processed are within the range. No hard coding, no special cases involved? Any dependant data that varies based on either a combination of the time-type being considered and the specific time in question can be done through a sub-table or if it's purely a time-definition level element that's constant across time, setting it up as part of the TimeDefinition table.[quote]I get it. You've built a system or two and now decided that you are going to write stuff. Congratulations. I applaud your choice. You're at least not scared to put something out in front of everyone and let people pick it apart. Fantastic.[/quote]The tone of feedback from some (not all, but enough) people is sufficiently unconstructive here that it's certainly made me regret my chosen outlet. It's got very much the feeling of fighting a forum-entrenched old-guard, and not something I'm caring to be part of. I'm engaging on this thread primarily because as it's instigator I feel some urge to see the debate through, as I'm having to hold ground in an otherwise interesting area of debate.[quote]You went about it the absolutely wrong way. [/quote]Subjective - please justify with a contextual example. This solution works fine for many scenarios - I never said it'd work for all.[quote] Publishing something that is very obviously going to have debate on and something that you very obviously didn't do very much benchmarking...[/quote]I'm very much aware of the performance characteristics of CTE's, however I've explicitly stated that this purely for use when incrementally populating time dimensions, and the article never infers that this solution is intended for any kind of performance intensive/OLTP scenario. It's an article to show a path, not an article on writing some arbitrary bit of assembly-code quality SQL.[quote] .... or testing of alternative solutions [/quote]The CTE is not the crux of the article itself - the retention of date/time values within fact data without having to resort to arbitrary lookup tables for these values is, however.[quote]while sensationalizing it with the title has gotten you exactly one thing.[/quote]I've had some positive feedback, some negative. The title has certainly encouraged more debate than otherwise would have occured, although it seems to have rubbed up some the wrong way. In my younger years I'd written a fair bit of editorial copy for public consumption, so I'm not really bothered by backlash on a forum - you can never really judge the crowd before your first solo ;)[quote] From this point forward, every time I see your name as the author of anything - book, article, post, I'm going to skip it and go on to something else. Every heard of "the boy who cried wolf"? [/quote]That's a fairly reaching statement that neither reflects accurately the article nor the points therein, and is driven from what is apparently nothing more than pedantic complaining about one of the tools used to achieve a result. The great thing about tools however, is that you're rarely required to use a single one for all problems.Thu, 03 Jun 2010 12:17:45 GMTSteven James GrayRE: Sins of SQL: The Time Tablehttp://www.sqlservercentral.com/Forums/Topic930511-2705-1.aspx[quote][b]Lynn Pettis (6/3/2010)[/b][hr][quote][b]dan_public (6/3/2010)[/b][hr]Lynn,Your point about constructive criticism is well taken. I'll keep that in mind and will scrub out the "crisp" language going forward. That said...I'm not sure why you think my initial post (copied below) was sarcastic. In fact, while the final statement was more declarative than the point you made in your article, the message is the same - while using Gray's methodology is something you [u]can[/u] do, it's probably not something that you [u]should[/u] do. Where is the sarcasm?Regards,Dan.[quote]Besides the many issues raised about this approach, another issue not addressed by this approach is auditing. The Time dim (the most important dimension, IMO) is dynamic. It's members are ephemeral and only created on the fly. There's nothing to audit.Auditing is less important when using when using a simple "textbook" example like the one chosen. OTOH, real-world Time dims can be much more robust and will take time to create, audit, and maintain.IMO, this is an example of a nice theoretical discussion of what you CAN do but probably should NOT do.Regards,Dan.One star.[/quote][/quote]Dan,Sorry, but I'm confused. I don't recall any of the above. Could you find the post number and provide that or a link to the post?Thank you,Lynn[/quote]Lynn,Here's the post link: [url=http://www.sqlservercentral.com/Forums/FindPost930695.aspx][/url]Regards,Dan.p.s. The forum software has an odd way of parsing links. You'll have to cut and paste. Sorry.Thu, 03 Jun 2010 12:12:29 GMTdan_publicRE: Sins of SQL: The Time Tablehttp://www.sqlservercentral.com/Forums/Topic930511-2705-1.aspx[quote][b]dan_public (6/3/2010)[/b][hr]Lynn,Your point about constructive criticism is well taken. I'll keep that in mind and will scrub out the "crisp" language going forward. That said...I'm not sure why you think my initial post (copied below) was sarcastic. In fact, while the final statement was more declarative than the point you made in your article, the message is the same - while using Gray's methodology is something you [u]can[/u] do, it's probably not something that you [u]should[/u] do. Where is the sarcasm?Regards,Dan.[quote]Besides the many issues raised about this approach, another issue not addressed by this approach is auditing. The Time dim (the most important dimension, IMO) is dynamic. It's members are ephemeral and only created on the fly. There's nothing to audit.Auditing is less important when using when using a simple "textbook" example like the one chosen. OTOH, real-world Time dims can be much more robust and will take time to create, audit, and maintain.IMO, this is an example of a nice theoretical discussion of what you CAN do but probably should NOT do.Regards,Dan.One star.[/quote][/quote]Dan,Sorry, but I'm confused. I don't recall any of the above. Could you find the post number and provide that or a link to the post?Thank you,LynnThu, 03 Jun 2010 12:08:26 GMTLynn PettisRE: Sins of SQL: The Time Tablehttp://www.sqlservercentral.com/Forums/Topic930511-2705-1.aspxLynn,Your point about constructive criticism is well taken. I'll keep that in mind and will scrub out the "crisp" language going forward. That said...I'm not sure why you think my initial post (copied below) was sarcastic. In fact, while the final statement was more declarative than the point you made in your article, the message is the same - while using Gray's methodology is something you [u]can[/u] do, it's probably not something that you [u]should[/u] do. Where is the sarcasm?Regards,Dan.[quote]Besides the many issues raised about this approach, another issue not addressed by this approach is auditing. The Time dim (the most important dimension, IMO) is dynamic. It's members are ephemeral and only created on the fly. There's nothing to audit.Auditing is less important when using when using a simple "textbook" example like the one chosen. OTOH, real-world Time dims can be much more robust and will take time to create, audit, and maintain.IMO, this is an example of a nice theoretical discussion of what you CAN do but probably should NOT do.Regards,Dan.One star.[/quote]Thu, 03 Jun 2010 11:37:31 GMTdan_publicRE: Sins of SQL: The Time Tablehttp://www.sqlservercentral.com/Forums/Topic930511-2705-1.aspx[quote][quote][b]DeronDilger (6/2/2010)[/b][hr][quote][b]dan_public (6/2/2010)[/b][hr]Steven,.....[G]oing forward, I will be on the lookout for your articles. And be sure not to read them.Have a nice day.Dan.[/quote]Not having "29 years of experience" but being keenly interested in learning from those with good ideas &/or years of experience/expertise, I am disappointed when I see comments such as Dan's closing sentences as such mean-spirited words make it harder for me to consider the rest of his post (which seems to raise valid points, otherwise). In this field, as in most that live in shades of gray, a public square full of differing opinions is a sign of a healthy environment....one better maintained by avoiding mean-spirited, rude and ad hominem statements.Thank you to the author and those (respectfully) offering differing opinions/reasoning. The back and forth is MUCH more valuable than any article by itself (controversial OR sacrosanct).[/quote]Deron,My point in mentioning my experience with analytical systems is that it doesn't make me the keeper of all wisdom. Quite the opposite, in fact. My experience has made me more open to differing opinions and methods, and that there can be multiple "best" solutions to a problem. The downside of this experience is that I've run into a quite a few people over the years who insist that THEY are the keepers of the wisdom. And that their approach is "best", regardless of differing needs and environmental constraints. That raises my hackles. I enjoy a vigorous exchange of opinions as much as the next person, but I'm not interested in a religious argument disguised as a debate. The purpose of the first is personal growth; the purpose of the second is to convert the "sinners". I'm interested in growing, but have no interested in being "converted".Regards,Dan.[/quote] Dan,1) If the author has stated he is the 'Keeper of Wisdom' then i believe you are the only one has seen that in his article. He even said "If you use this I'd love to hear about it" and those aren't the words of one who believes they are the keeper of wisdom. 2) Not sure how you get " a religious argument disguised as a debate.." out of the thread on this article but again I think you may be the only one seeing that however I will give you this one.3) Lastly....[Quote]I enjoy a vigorous exchange of opinions as much as the next person..[/quote]Then please keep the comments to those of a constructive nature and not those of sarcasm. Your original post was very clearly intended to convey sarcasm which is far from constructive and that is what Deron was trying to tell you and that you failed to understand. Its up to you how you handle this and how you reply to threads but if you can't be constructive and or aren't sure if you are being constructive then just don't reply. There's enough quality feedback that is constructive to weight the thread down with cheap shots and sarcasm.Thu, 03 Jun 2010 10:35:24 GMTYSLGuruRE: Sins of SQL: The Time Tablehttp://www.sqlservercentral.com/Forums/Topic930511-2705-1.aspxI want to thank everyone that posted on this thread. It is one of the most interesting and dynamic threads I have read in a long time. Whether you agree with the author's article or not, he deserves a great deal of credit for the discussion the article created. I expect I will study this thread for a quite a while.Thu, 03 Jun 2010 07:34:04 GMTDaniel BowlinRE: Sins of SQL: The Time Tablehttp://www.sqlservercentral.com/Forums/Topic930511-2705-1.aspx[quote][b]Steven James Gray (6/3/2010)[/b][hr][quote]That is quite a strange response. Your article spent time basically explaining how your Gregorian calendar approach using a recursive CTE is the only "good" solution. (Afterall, you did title it the "Sins of SQL". Which, by the way, has absolutely nothing whatsoever to do with SQL.) [/quote]I never said in the article it was the only good solution. The 'Sin of SQL' being referred to is creating an arbitirary resolution dimension analogue for a well-defined contiguous range of data (date/time) with a new surrogate key. [/quote]What you fail to realize is that you are dealing with two very different things and trying to generalize them. A day is NOT a "time interval". A day is NOT contiguous. But, you're obviously so wrapped up in your shiny little solution about TIME to understand that. One of my systems deals with DATES, only DATES, and nothing but DATES. It could care less about hours, minutes, and seconds. On this day - 6/3/2010, there are exactly 14 different definitions for 6/3/2010, all at the same time and all used by the business. 6/3/2009 has 11 different definitions and 6/3/2011 currently has 17 different definitions. If that weren't enough, in another cube, this precise hour in time (7AM) on this day happens to have 4 different business definitions while this same hour 1 year ago only had 3 business definitions. Could you deal with all of this in code? Sure. But, your code would be so ridiculously convoluted with more exception than actual rule and you'd spend just about all of your time testing and modifying the code to deal with each date that you'd never be done.I get it. You've built a system or two and now decided that you are going to write stuff. Congratulations. I applaud your choice. You're at least not scared to put something out in front of everyone and let people pick it apart. Fantastic.You went about it the absolutely wrong way. Publishing something that is very obviously going to have debate on and something that you very obviously didn't do very much benchmarking or testing of alternative solutions while sensationalizing it with the title has gotten you exactly one thing. From this point forward, every time I see your name as the author of anything - book, article, post, I'm going to skip it and go on to something else. Every heard of "the boy who cried wolf"? Welcome to the club.Thu, 03 Jun 2010 07:05:16 GMTmhotek-836094RE: Sins of SQL: The Time Tablehttp://www.sqlservercentral.com/Forums/Topic930511-2705-1.aspx[quote][b]Steven James Gray (6/3/2010)[/b][hr][quote][b]Martin Mason (6/2/2010)[/b]Whoa. Timeout. If you're talking about SSAS, your argument only considers partition processing, NOT dimension or aggregation processing SPACE, TIME, or MEMORY CONSUMPTION. With separate dimensions, the time dimension at second level granularity is going to have process 3600 rows, 3600 members at the key level. Your date dimension for 10 years of data at the day grain would be about the same. On the other hand, your composite design with 10 years at second level granularity would be 13,140,000 records at the key level. Which is the more performant solution, one that requires processing of 3600 * 2 records or the one that requires processing of 3600 squared?[/quote]Let's take an example with an arbitrary resolution. If you have years worth of data (365 days) and a time metric at the second level (86,400 rows in your time table). For a given year in SSAS the cell-space/aggregations would need to potentialy deal with something in the area of 31,536,000 distinct permutations - it's not just "Two dimesnions, so times by two" - each combination of time/date could house different values and thus is distinct/seperate cells. This is inescapable, whether or not you use two seperate dimensions, or one hierarchical one. If this was not the case, it'd mean you could only get the aggregations on time OR date, but never an intersection of both, which is clearly not the case.I suppose if you're hell bent on using two seperate dimensions, could still easily adapt the technique demonstrated (or use similar), so that your fact tables are still only storing the real datetime value, but then when creating your Data-Source View in Business Intelligence studio, extract the date/time values and wire them up to two seperate dimensions generated on the fly seperately.[/quote]SSAS WILL store each member of a dimension and its related attribute relationships whether referenced by fact table data or not. That storage WILL impact performance, both query and processing. With two separate dimensions, 12:01 AM is stored once and only once no matter how many years defined in your date dimension. With your approach, it's created for as many days as defined in your dimension. Which do you think will be better? I'm "hell bent" on using two separate dimensions because it's a better approach. Read Chapter 2 of the The Data Warehouse Toolkit for an explanation.Thu, 03 Jun 2010 04:17:16 GMTMartin MasonRE: Sins of SQL: The Time Tablehttp://www.sqlservercentral.com/Forums/Topic930511-2705-1.aspx[quote][b]DeronDilger (6/2/2010)[/b][hr][quote][b]dan_public (6/2/2010)[/b][hr]Steven,.....[G]oing forward, I will be on the lookout for your articles. And be sure not to read them.Have a nice day.Dan.[/quote]Not having "29 years of experience" but being keenly interested in learning from those with good ideas &/or years of experience/expertise, I am disappointed when I see comments such as Dan's closing sentences as such mean-spirited words make it harder for me to consider the rest of his post (which seems to raise valid points, otherwise). In this field, as in most that live in shades of gray, a public square full of differing opinions is a sign of a healthy environment....one better maintained by avoiding mean-spirited, rude and ad hominem statements.Thank you to the author and those (respectfully) offering differing opinions/reasoning. The back and forth is MUCH more valuable than any article by itself (controversial OR sacrosanct).[/quote][quote][b]DeronDilger (6/2/2010)[/b][hr][quote][b]dan_public (6/2/2010)[/b][hr]Steven,.....[G]oing forward, I will be on the lookout for your articles. And be sure not to read them.Have a nice day.Dan.[/quote]Not having "29 years of experience" but being keenly interested in learning from those with good ideas &/or years of experience/expertise, I am disappointed when I see comments such as Dan's closing sentences as such mean-spirited words make it harder for me to consider the rest of his post (which seems to raise valid points, otherwise). In this field, as in most that live in shades of gray, a public square full of differing opinions is a sign of a healthy environment....one better maintained by avoiding mean-spirited, rude and ad hominem statements.Thank you to the author and those (respectfully) offering differing opinions/reasoning. The back and forth is MUCH more valuable than any article by itself (controversial OR sacrosanct).[/quote]Deron,My point in mentioning my experience with analytical systems is that it doesn't make me the keeper of all wisdom. Quite the opposite, in fact. My experience has made me more open to differing opinions and methods, and that there can be multiple "best" solutions to a problem. The downside of this experience is that I've run into a quite a few people over the years who insist that THEY are the keepers of the wisdom. And that their approach is "best", regardless of differing needs and environmental constraints. That raises my hackles. I enjoy a vigorous exchange of opinions as much as the next person, but I'm not interested in a religious argument disguised as a debate. The purpose of the first is personal growth; the purpose of the second is to convert the "sinners". I'm interested in growing, but have no interested in being "converted".Regards,Dan.Thu, 03 Jun 2010 03:08:27 GMTdan_publicRE: Sins of SQL: The Time Tablehttp://www.sqlservercentral.com/Forums/Topic930511-2705-1.aspx[quote]Not having "29 years of experience" but being keenly interested in learning from those with good ideas &/or years of experience/expertise, I am disappointed when I see comments such as Dan's closing sentences as such mean-spirited words make it harder for me to consider the rest of his post (which seems to raise valid points, otherwise). In this field, as in most that live in shades of gray, a public square full of differing opinions is a sign of a healthy environment....one better maintained by avoiding mean-spirited, rude and ad hominem statements.[/quote]I think we've all seen Sins of SQL from people with "30 years of SQL Server 2005 experience". I try not to worry too much about the lineage someone asserts, much more interested in the examples they bring to the table ;)[quote]Thank you to the author and those (respectfully) offering differing opinions/reasoning. The back and forth is MUCH more valuable than any article by itself (controversial OR sacrosanct).[/quote]Indeed. I enjoy the debate aspect, since it yields a lot more opportunity for learning than simply accepting conventional wisdom at face value.Thu, 03 Jun 2010 00:39:21 GMTSteven James GrayRE: Sins of SQL: The Time Tablehttp://www.sqlservercentral.com/Forums/Topic930511-2705-1.aspx[quote][b]Martin Mason (6/2/2010)[/b]Whoa. Timeout. If you're talking about SSAS, your argument only considers partition processing, NOT dimension or aggregation processing SPACE, TIME, or MEMORY CONSUMPTION. With separate dimensions, the time dimension at second level granularity is going to have process 3600 rows, 3600 members at the key level. Your date dimension for 10 years of data at the day grain would be about the same. On the other hand, your composite design with 10 years at second level granularity would be 13,140,000 records at the key level. Which is the more performant solution, one that requires processing of 3600 * 2 records or the one that requires processing of 3600 squared?[/quote]Let's take an example with an arbitrary resolution. If you have years worth of data (365 days) and a time metric at the second level (86,400 rows in your time table). For a given year in SSAS the cell-space/aggregations would need to potentialy deal with something in the area of 31,536,000 distinct permutations - it's not just "Two dimesnions, so times by two" - each combination of time/date could house different values and thus is distinct/seperate cells. This is inescapable, whether or not you use two seperate dimensions, or one hierarchical one. If this was not the case, it'd mean you could only get the aggregations on time OR date, but never an intersection of both, which is clearly not the case.I suppose if you're hell bent on using two seperate dimensions, could still easily adapt the technique demonstrated (or use similar), so that your fact tables are still only storing the real datetime value, but then when creating your Data-Source View in Business Intelligence studio, extract the date/time values and wire them up to two seperate dimensions generated on the fly seperately.Thu, 03 Jun 2010 00:29:23 GMTSteven James GrayRE: Sins of SQL: The Time Tablehttp://www.sqlservercentral.com/Forums/Topic930511-2705-1.aspx[quote]That is quite a strange response. Your article spent time basically explaining how your Gregorian calendar approach using a recursive CTE is the only "good" solution. (Afterall, you did title it the "Sins of SQL". Which, by the way, has absolutely nothing whatsoever to do with SQL.) [/quote]I never said in the article it was the only good solution. The 'Sin of SQL' being referred to is creating an arbitirary resolution dimension analogue for a well-defined contiguous range of data (date/time) with a new surrogate key. [quote] Your solution to each of the holes in the approach is to simply start bolting on additional stuff. [/quote]Using a base technique for the core of the problem I attempted to solve, and then extending it to solve additional requirements?[quote]So, let me see... Use a recursive CTE that has to generate the set of dates each time and also can't be used to enforce any kind of integrity.[/quote]When would you have to delete a time record? Again, if you needed to exclude specific ranges of time, an exclusions table that operates on the span level would work.[quote] Then when you need holidays, bolt on a look up table. Wait, need business defined week ending dates, quarter ending dates, and year ending dates - bolt on another lookup table or tables. Need to vary the start/end of a business week by country or even change the definition after several years while still preserving the previous definition - bolt on another lookup table. By the time you're done bolting on all of these special purpose lookup tables just to handle all of the permutations in a single cube that I'm dealing with, you have nothing but a mess that blatantly demonstrates your assertion about not needing a date table is completely full of holes that you never considered.[/quote]Let's just disect an example: you have to exclude an arbitrary non-expected holiday from your figures - a period of 24 hours. In the approach I've proposed, you'd create a table of exclusions with a start/end time specified, and then join against it where the date of the row lay between. The advantage is that you do not need to redefine your holidays/other linkage tables based on whether or not you're dumping the data at the day, hour, minute or seconds or even resolution. In this approach we're treating time like a contiguous field over the fact tables, and then the dimension for dates/times becomes an arbitrary ruler over it.In the alternative approach of a time table, perhaps you annotate each row with a 'IsHoliday' or 'ExcludeAccounting' flag. Trouble is that if you change the resolution up/down on your date/time data, you the need to re-key your source data to the new definitions, and you have to build your time table and figure out a way of preserving the manual edits. Even when the values are fixed, you likely need to resolve the TimeID or some such on the way in.Thu, 03 Jun 2010 00:17:40 GMTSteven James GrayRE: Sins of SQL: The Time Tablehttp://www.sqlservercentral.com/Forums/Topic930511-2705-1.aspxOkay Steve, I find it admirable that you will stand by your code. Unfortunately, I took that as a challenge and wrote my own inline TVF that generates the same results sets as yours with the same input values and then ran some tests. Now, I didn't write it up in a really formal method, but i have published a short synopsis of the results in my blog so that others my see the differences between the two approaches.Please, take a little time and read this post, [url=http://www.sqlservercentral.com/blogs/lynnpettis/archive/2010/06/03/sins-of-a-recursive-cte.aspx][b]Sins of a Recursive CTE?[/b][/url]. I think you will notice that the recursive CTE really isn't the best option. Of course, you are free to copy my code and do any testing that you desire, just be sure to let us know your results.Thu, 03 Jun 2010 00:17:14 GMTLynn PettisRE: Sins of SQL: The Time Tablehttp://www.sqlservercentral.com/Forums/Topic930511-2705-1.aspx[quote][b]Steven James Gray (6/2/2010)[/b][hr]Regarding some of the other date concerns:1) Using non-gregorian calendars: There are free and widely available mechanisms to map UTC time-spans to/from these calendars. These can be incorporated into the functionality where needed without too much ado. 2) Designating holidays/special dates - easily achievable via using lookup tables that can be joined into the rowset the CTE is generated.3) Auditing of the dimension (when incorporating any of the above, or other related change scenarios) can be done readily by calling the function and dumping the data out somewhere for analysis/archiving.[/quote]That is quite a strange response. Your article spent time basically explaining how your Gregorian calendar approach using a recursive CTE is the only "good" solution. (Afterall, you did title it the "Sins of SQL". Which, by the way, has absolutely nothing whatsoever to do with SQL.) Your solution to each of the holes in the approach is to simply start bolting on additional stuff. So, let me see... Use a recursive CTE that has to generate the set of dates each time and also can't be used to enforce any kind of integrity. Then when you need holidays, bolt on a look up table. Wait, need business defined week ending dates, quarter ending dates, and year ending dates - bolt on another lookup table or tables. Need to vary the start/end of a business week by country or even change the definition after several years while still preserving the previous definition - bolt on another lookup table. By the time you're done bolting on all of these special purpose lookup tables just to handle all of the permutations in a single cube that I'm dealing with, you have nothing but a mess that blatantly demonstrates your assertion about not needing a date table is completely full of holes that you never considered. There might be a tiny bit of merit on the time side, but your architecture on a set of dates wouldn't survive 5 minutes in any of the hundreds of environments where I've worked on or designed analytic systems.Wed, 02 Jun 2010 23:02:18 GMTmhotek-836094