I adapted Aaron’s script to work in Azure SQL Data Warehouse and am sharing it with you below, so you don’t have to do the same. I did leave out holidays because I didn’t need them in my calendar table. To add them back in, just add the column back to the table and use Aaron’s update statements. I also changed the MonthYear field to include a space between the month and year. Otherwise, my script should produce the same results as Aaron’s in a Azure SQL DW friendly way. Notice that I chose to use Round Robin distribution for my date table. Also, tables are now created with clustered columnstore indexes by default in Azure SQL DW.

Instead of the computed columns, I inserted the dates, then updated the other columns in the temporary table. Then I used the Create Table As Select syntax to create my final RPT.Calendar table.

I’ve been working on a project that includes geographical data representing stops on a delivery route. I’ve just completed loading this data into a data mart. The source data contains longitude and latitude in millionths of a degree with 9 digits of data. We haven’t decided what tool we will use to visualize this data yet, but we know Power View and Power Map both accept latitude and longitude values. I decided to store my longitude and latitude data in decimal (9,6) fields. There is a good possibility that we may be computing distances between points in the future, so I thought it would be good to store the data as a spatial data type as well. I thought I would share a few things that I learned along the way.

There are two spatial data types in SQL Server: geometry and geography. Geometry represents the flat-earth system where units are all equally spaced apart. Geography represents the round-earth system measured in latitude and longitude. Since I had longitude and latitude in my data, I used the geography data type. The geography spatial data type is implemented as a .NET common language runtime (CLR) data type in SQL Server.

I populated my table using a query of which I’ve included a snippet below. You can see the use of the Point function to create my geography values.

The Point function accepts a a latitude, longitude, and SRID, and returns a geography value. An SRID is a unique identifier associated with a coordinate system, tolerance, and resolution. SRIDs are not specific to SQL Server. They are maintained by the International Association of Oil & Gas Producers (OGP) Surveying & Positioning Committee. Here’s a blog post that I think does a good job explaining many of the terms associated with spatial data in SQL Server.

Tip #1: You can see a list of SRIDs available in SQL Server by running the following query. SQL Server uses the default SRID of 4326, which is the WGS 84 spatial reference system.

SELECT * FROM sys.spatial_reference_systems

My source database has planned delivery stops and times and actual delivery stops and times stored in separate columns in a very wide table. I decided to pivot that data and create a table with a scenario key that refers to either plan or actual data. To do this, I wrote 2 queries and attempted to union them together to produce my final data set. That’s when I learned:

Tip #2: When SQL Server performs a UNION it must compare values to remove duplicate rows. CLR user-defined type columns like geography are not comparable. As long as there is no risk of duplicate data between the two sets, you can use UNION ALL.

As I finalized my table design I considered using a computed column to store my geography data. But I encountered an issue when I went to add a spatial index. Spatial indexes are built on top of B+ trees. They decompose space into 4 levels of grids. I think spatial indexes are interesting, but they have some restrictions of which you should be aware. They require the table to have a clustered primary key. They cannot be specified on indexed views. And…

Tip #3: You can create a computed column to store the geography point based upon the latitude and longitude. But you cannot create a spatial index on a computed column.

If you try to create a spatial index on a computed column you will get SQL Server error message 6342.

You don’t have to use spatial data types just because you have spatial data. Many data viz tools have built-in geocoding that will accept longitude and latitude or an address. But spatial data types can be useful when calculating distances between two points and planning and measuring routes.

I’ve been working on building my Biml library over the last few months. One of the first design patterns I created was a Type 1 Slowly Changing Dimension where all fields except the key fields that define the level of granularity are overwritten with updated values. It assumes I have a staging table, but it could probably be easily modified to pull directly from the source table if needed.

For simplicity, I’m going to create DimSalesReason from the AdventureWorks database as my example. Please pardon the large amount of code in this post. I wanted to provide a fairly complete picture so you can see how the pieces fit together so I’m providing all of the SQL and Biml needed to generate the package.

You’ll notice I have some audit fields in my table. UpdateDttm is the datetime when the row was last updated. InsertDttm is the datetime when the row was initially inserted. The HistoricalHashKey and ChangeHashKey are used for change detection. I may not need them in a dimension this simple, but I use them in larger/wider dimensions and I like my dimensions to be built consistently (unless they need to be tweaked for performance). The HistoricalHashKey represents the business key of the table that defines the level of uniqueness for the row in the dimension table. The ChangeHashKey represents all the other fields that are used in the dimension row. This makes it easier to determine if values have changed since I can compare just the ChangeHashKey instead of each individual field.

I also have an update table. This allows me to update in bulk rather than updating in place row by row. With a very small table, you might not notice a performance difference, but this pattern becomes much more efficient as the table grows.

Using a view to do the transformation work allows me to use the SQL Server engine to do the things it is good at (joins, case statements, conversions) and SSIS to do the things it is good at (controlling the flow of data). It also makes it easy to create a design pattern in Biml that is abstract enough that it easily fits most SCD 1 scenarios. The hashkey fields allow me to do a lookup on one field to determine if the row already exists in the dimension table and a quick comparison to determine if any of the values for that row have changed. I chose to use an MD5 hash because I think it is a good balance of speed/size and collision risk. You may feel differently (others prefer SHA1 due to lower collision risk). On a dimension this size, the difference probably isn’t noticeable.

You can also see that I add my unknown member row in my view. Some people prefer to set the surrogate key of the unknown value to -1. I prefer to set the business key to -1 and let the surrogate key be set to any value in the load process. I’m not a fan of having to turn off the identity insert to add the unknown row. When I do the dimension key lookup for fact tables, I just look for the row where the business key = -1. This also means I don’t have to check my dimension tables to see if someone remembered to insert the unknown rows after deployment to a new environment because I know the unknown rows will be inserted when the package is run.

I have audit steps in my package that write to the Audit.PackageControl table using stored procedures.

And Now for the Biml

ProjectConnections – I like to keep my connection managers in a separate file so I only have to update one place if I need to add or change a connection.

Dim1 – This contains my actual design pattern.

CreateDim1 – This is the Biml file I run to generate the package. It gets separated so I can pull values from databases and pass in variables to my design pattern using BimlScript. For this example I have hardcoded my variables into this file rather than pulling from a database.

Resulting SSIS Package

Executing the CreateDim1.Biml file generates a package called LoadDimSalesReason.dtsx. Here is the control flow:
I start the package by logging the package start in my PackageControl table. Then I truncate my update table to prepare for the new data load. Next I have a data flow task, which inserts data into either the dbo.DimSalesReason table or the Updt.UpdtSalesReason table (see below). The SQL Update task updates the DimSalesReason table with the rows that were inserted into UpdtSalesReason. And finally, I log my package completion.

Here’s my data flow:
First, I retrieve my data from my staging view. Then I count the number of rows coming from my source so I can log it in my PackageControl table. The lookup on the HistoricalHashKey field sends rows with no match to be inserted into the dimension table. If the rows have a match, they are checked to see if their ChangeHashKey values match the value of the ChangeHaskey in the existing row in the dimension table. If ChangeHashKey values match, the row is counted and nothing else is done. If the ChangeHashKey values don’t match, the row is counted and written to the update table.

I’ve had several occasions to use OPENROWSET recently in T-SQL. Although you can use it as an ad hoc alternative to a linked server to query a relational database, I’m finding it useful to get data from other sources. I’ll go into details of how to use it, but first I would like to acknowledge: OPENROWSET is not the answer to everything. Sometimes you should use a linked server or SSIS or some other method. I had some specific cases where I felt OPENROWSET made more sense for my task.

Importing Data from an Excel Spreadsheet

Technical need: I had some data that needed to be validated before importing into SQL Server from Excel as part of an upload process in an application. If it didn’t pass validation, I did not want to import the data. Rather than use SSIS to import the data into a table and then check it, I used OPENROWSET to pull it directly from the Excel file into a temp table, ran all of my validation checks, and then imported the data using SSIS all within a stored procedure that was called by an application. Field and table names have been changed to protect the innocent.

To make this work you need to install the Microsoft ACE OLE DB 12.0 provider on your server. You cannot use the Jet OLEDB Provider with 64-bit SQL Server and 64-bit Excel. Once you have this installed you need to change some settings:

Here is a blog post that explains the errors you might get if you don’t have these settings correct. It also notes that you can go check your settings in the registry keys. For SQL Server 2012 the path is HKEY_LOCAL_MACHINE -> SOFTWARE -> Microsoft -> Microsoft SQL Server -> MSSQL11.MSSQLSERVER -> Providers -> Microsoft.ACE.OLEDB.12.0. You can then see that AllowInProcess and Dynamic parameters are set to 1.

This provider seems to be a bit fussy and there are some things that took me a while to troubleshoot once I got started. First, make sure that the location of the Excel file you are reading is accessible to the account that is running the SQL engine and doesn’t require admin access. The Linked Server errors that get returned with this really aren’t that helpful, so you have to use your Google-fu to work your way through it. I used a global temporary table so it would be accessible across connections if it needed it to be (I was having trouble when I used a local temp table). The application that calls this stored procedure queues up the calls and executes them serially so one can’t override the other.

Returning results from an Analysis Services cube in SQL

Technical need: I had some data (several facts with common dimensions) with calculations in a multidimensional cube and an application that needed that data but couldn’t/wouldn’t consume the MDX result set.

The query: For demonstration purposes I’m using a small query that returns the values for a drop-down box for selecting a member of the date dimension.

You can see that I explicitly named my fields, but you can also do a select * to get all the fields from the OPENROWSET. I had to put the fields from the cube in quotation marks to make it work because of the brackets in the field names. For Analysis Services, MSOLAP is the name of my provider. You must have the correct version of this provider listed under the linked servers on the server where this is being executed.

For SQL Server 2012 SP1, you can download the Microsoft Analysis Services OLE DB Provider for Microsoft SQL Server 2012 SP1 here. There is nothing special about the data source and initial catalog. They are exactly the same connection info you would use when connecting to SSAS through Management Studio. The last part is simply the MDX query. You should be able to copy that part out and run it in Management Studio connected to your SSAS database and get results.

Inserting Images Into a Database

Technical need: I was building a demo for Power BI and I needed to show some images in my Power View reports in Office 365. In order for the images to show in Office 365, you must import the images into SQL Server and then import them into Power Pivot. First I downloaded images of flags for each country from the CIA world factbook. I took the folder of flags and put it in my pictures folder. I already had a table in SQL Server that contained country names and codes. So I just needed add a flag for each country.

I figured out how to do this based upon this MSDN blog post. I inserted my other data first and then added the images. I’m sure you could find a way to do this all in one select statement. Since this was a one-time thing I didn’t see the need to find another way. It should be noted that the bulk insert must pull from a location on your computer; you can’t use links to the images on the internet. I downloaded the images in a folder. Each image name was the abbreviation for the country. If you are going to import this data into PowerPivot, be sure you change the table behavior properties so it will recognize the varbinary field as an image.

There may be other ways to accomplish my goals, but it’s nice to understand how to use OPENROWSET and have that in my toolbox.