Just a three hour tour…

Menu

Articles

Recently I talked to someone about a view calling another view, a practice which is known as nested views. TL;DR Do not write Nested Views. Give this person a little credit, he had heard it was a bad idea, but he had a really good reason. There is no such thing. There is also no reason to take my word for it. Here are a number of other very good posts which detail why nested views are bad from a performance perspective.

I understand that on the internet, no matter how cockamamie your idea is, you can find someone or some post which will agree with you. That is why I trust sources who I know and trust rather than the random chatter of places like Stack Overflow.

Non-Performance Reasons Not to Use Nested Views

In addition to the performance reasons, there are other reasons not to use nested views. Supporting and maintaining nested views can be a nightmare. If there is an issue with the accuracy of the data, finding the problem is just that much harder. And what about when one needs to go investigate or fix a problem? Then someone has to test all of the places the view is called. I worked with some code recently where a view was created on a set of data just to create 3 new values based up 3 case statements. That view was nested many levels below where the field was actually called. I know people from a development background look at nested views as modular development. Don’t. TSQL is not the same as object oriented code.

Good Use Cases for Creating Views

Views are not always a bad idea. If you are creating an Analysis Services or Power BI project, it is a good idea to create views of the tables you are going to be using, even if there is no difference between the view and the table. Why? That way if the table is changed, your project will still work. Exposing a single view to users who are going to use self-service BI against it can also be a good idea. None of these are reasons to nest views, ensuring that the source data is anything other than a table.

If all you have is a Hammer Everything Looks like a Nail

It takes a while to learn how to write good SQL and I have seen people rely on a view because it takes a while longer to write good SQL. Relying on a view though is like having one tool in the toolbox. If you don’t have a screwdriver, one may be tempted to use a Hammer instead. Too often nested views are written because sometimes writing SQL is hard and it is easier to just write a query and save it as a view rather than use a CTE or a derived table. Do not succumb to the temptation. There is a lot of code written where someone as in a hurry and they were planning on going back and fixing it someday. Someday appears with the same frequency as a lottery. Make it easier on the person who has to figure out or maintain the code left in your wake. Don’t write nested views. Your replacement will thank you for it.

In the world of exploding piles of data, there are times you just need less not more. If you are working on a proof of concept, just want to test out some code, or want to pass a selection of data to a machine learning experiment, you don’t need several million records. Having that much data will just keep you from getting to the answer in a reasonable period of time. There are a number of standard ways people limit the data, the most common being some kind of a date filter. Using a date range though often times does not provide the variability needed. This is a particular problem with data used in a machine learning experiment it is designed to create an algorithm based on data pattern extrapolated over time. For example if you are doing any kind of regression analysis on a retail client and you either exclude or include the Christmas shopping season, the algorithm created will not be correct. The goal is to have less data for analysis pulled from the entire set of data. Fortunately SQL Server since 2005 has several methods for selecting random data

Using TABLESAMPLE

Until recently, I hadn’t used the Transact SQL TABLESAMPLE clause, but I ran into a situation where I needed to test some things and not wait all year for a result. TABLESAMPLE to the rescue. There are a couple of things where it won’t work. If you have derived tables, tables from a linked server (bad idea), or are writing a view and you want to return random data you can’t use TABLESAMPLE. If those conditions do not apply, you can use it.

The random sample that TABLESAMPLE provides is based on the number of data pages, not the number of records. If you want the number of rows to be specifically limited you will need to use Top(n) as well. I’ve written all my samples based upon AdventureWorksDW so you can run them for yourself later. I’ve listed the variety of ways to call TABLESAMPLE and shown the number of records returned.

When looking at the number of records returned, the values are not 10 percent exactly or particularly consistent in the number of rows returned. If you only want 6039 records returned, you can try the following code, but it doesn’t really do what it says.

This code will provide the exact number of records desired. Noticed I upped the number of rows returned in order to get 6039 rows. If the sample is 6039 you cannot guarantee that you have enough rows returned.

Really Random Sample

TABLESAMPLE has some really useful functionality, but for machine learning I need a truly random sample, which TABLESAMPLE does not provide. For that I need NEWID(). The following sample returns approximately 1% (.01) of the 60398 rows.

Just for fun I ran the same code 3 times and got a variety of rows returned.

(600 row(s) affected)
(607 row(s) affected)
(622 row(s) affected)

The ProductKey is added so that the NEWID() function will calculate a sample for each row. The WHERE statement calculates a random float between 0 and 1. This will truly give me the random sample I would need for a machine learning experiment.

I am looking forward to SQL Server 2016, which includes an update to DAX which will generate dates for you, but since it hasn’t been released yet, the need still exists to create a date Table. Since I am creating a Tabular Analysis Service instance, an identified date dimension is needed for time functions. There are a number of different ways to create a date dimension, here I am going to take advantage of recu There are a number of ways you can create a date dimension, here I am going to describe a neat way of creating one using a recursive CTE, which decreases the SQL code required to generate one.

Date Dimension Fields

There are a number of fields which are pretty standard for date tables. The fields can vary if you need to add things like Fiscal years or Month End dates which have business specific rules like last Friday of the month. The most common fields needed for a Date Dimension are listed here.

DateKey – Integer key representing the date, for example 20150524

CalendarDate – Date time field

CalendarYear – Four digit year

QuarterNumber – Number from 1 to 4

QuarterName – First Quarter, Second Quarter, 1 Qtr 2015, First Quarter 2015 are some of the ways the name is listed. Pick whichever format people would like to see.

MonthNumber – 1 for January, 2 for February, 3 for March etc.

NameMonth – January, February, March etc.

MonthDayNumber – July 15 would have the value of 15 here

WeekDayNumber – Date of the Month. July 4 would have a 4 here.

CalendarWeekofMonth – Counts the weeks within each month

NameDay – Monday, Tuesday etc.

DayNumber – Whether you choose 1 for Sunday or 1 for Monday is a business rule you need to find out first. This example shows 1 for Sunday, 2 for Monday etc.

YearDay – Sometimes referred to as the Julian number this counts the days from 1- 365 and can be useful in some calculations

YearWeek – Counts the weeks from 1 -52

As a best practice, remember reserved words such as DATE or Index or any other word which shows up as blue or pink in SQL Server Management Studio, should never be the names of any columns. Always select a name other than a reserved word when creating tables. Also friendly names containing spaces are great for exposing to users, but they are annoying in SQL Server, so leave the spaces out of the column names. Evaluate what dates you are going to be needing in the table so that you don’t have to go back and redo it. I am creating a smaller table here, but that is just because it is an example. Look at the dates you will be storing in your data warehouse when determining your starting dates, and set the end dates for probably about five years longer than you think the data warehouse will still be in use.

Using a CTE to Generate a Date Dimension

CTEs, which Microsoft added in 2005, is a great way to generate a date table by harnessing the power of computer to spin through your code, decreasing the need to write a lot of code. I am using a recursive CTE method here first published by Itzik Ben-Gan to generate the number table being used in the code below.

IsDate is an often used feature used in TSQL to determine whether or not a field contains a date. I am sure there are a number of people who have used this successfully and may wonder what I could possibly be talking about. To maintain my technical bona fides I’ve included some code, which you can run inside of SQL Server Management Studio.

Select case when isdate(somefield) = 0 then --False, not a date
'1/1/1900'
else Convert(datetime, somefield, 110)
end as ConvertedDate , Somefield
from @validdatetest
where works = 1

It is obvious by looking at my example, that some of the values inserted into the temp table @validDateTest are not dates. The code below the insert statements will Convert only the values which IsDate says are dates, not all of the values as I have a where condition in the query. The values that will Convert without giving me an error have a works value of 1. If you run this query, you will see the difference in the values returned by IsDate, and the values I have provided that work without returning an error.

Now if you run the query listed above and take off the where condition

Select case when isdate(somefield) = 0 then --False, not a date
'1/1/1900'
else Convert(datetime, somefield, 110)
end as ConvertedDate , Somefield
from @validdatetest

You’ll get this errorMsg 241, Level 16, State 1, Line 21 Conversion failed when converting date and/or time from character string.

Casting Around for Bad IsDate solution

If you are fishing around for a solution, the simplicity of this will probably make you smack your desk. Now ideally, the source system should be fixed, as it really should be making sure that the dates are valid. I can’t make that happen. But I do have a solution to resolve this problem, just Cast it. Cast and Convert are on the same page in Microsoft online help and a lot of the time you might think they are synonymous, but they don’t work the same way.

Select case when isdate(somefield) = 0 then --False, not a date
'1/1/1900'
else Cast (somefield as datetime)
end as ConvertedDate , Somefield
from @validdatetest

This works with no errors whatsoever. Simple. Easy. If you ever run into this, I hope this saves you some time.