Arcane-SQL–A PowerShell Module for Generating SQL Code

April 22, 2014 — arcanecode

Overview

There are many PowerShell modules available for assisting the busy DBA with managing their SQL Server environment. This isn’t one of them. This module is targeted toward SQL Developers, with special functionality for data warehouse developers. A common task for BI professionals, one that is performed on almost every project, is the creation of a staging area. This might be a set of tables in the data warehouse, perhaps in their own schema, or in an entirely separate database often called an operational data store (ODS).

The staging tables are typically similar in structure to the ones in the source database. Similar, but not identical, as there are some small modifications which are commonly made to the staging tables. First, large data types such as VARCHAR(MAX) are seldom useful in data analysis and thus could be removed. Next, even the most casual user of SSIS will quickly see SSIS prefers to work with the double byte character sets (WSTR in SSIS, which maps to NVARCHAR in T-SQL) as opposed to the single byte (STR/VARCHAR) character sets. It can be helpful to convert these in the staging area.

This Module can (optionally) do all of these things and more when it is used to generate CREATE TABLE or SELECT statements. Imagine if you will a source system with thousands of tables and the need to create a staging area for it in a new data warehouse. This quickly becomes a long, boring tedious task. Now imagine being able to write a bit of PowerShell code and generate these tables in just a few minutes time.

Before diving in, it is highly suggested you download and review the example script, Arcane-SQL-Example.ps1. This demonstrates the most commonly used functions and provides patterns for their use.

Functionality

While the module is full of functions, there are a few core ones that should be highlighted. Complete documentation can be found in the module itself, which has been fully documented using the native PowerShell Help system. In addition there is an example script file which demonstrates some of the most common tasks.

Enable-SqlAssemblies – This is the most important function, without calling it nothing else works. Be aware the SQL Server assemblies (including the SMO – SQL Management Objects – and SQL Provider) need to be on the machine where this script is run. This module has been tested on, and intended for, SQL developers with SQL Server Developer Edition installed on their workstations.

Join-InvokeInstance and Join-ProviderInstance – Most of the interaction done with the SQL Provider requires the server name and instance name, assembled in a path like syntax. The Invoke-SQLCommand likewise requires this formatting, however it has a little quirk. If the instance is "default" then the Invoke requires it to be omitted while the provider requires it to be present. These two functions reduce the confusion, simply pass in the server name and instance, and they will format things correctly.

Get-TablesCollection – When working with tables it is common to iterate over all the tables in a database. This function will generate a PowerShell array of table objects, each object being of type Microsoft.SqlServer.Management.Smo.Table. By having table objects the wide variety of properties for the table are available, such as Schema name, Table name, and Row Count.

Get-TableByName – Most commonly scripts will retieve an array of tables using the above Get-TablesCollection, then iterate over them in a foreach loop. There are times however when only a single table from the collection is desired. For those types the Get-TableByName can be used to retrieve a specific table object based on the name of the table.

Remove-SchemasFromTableCollection and Select-SchemasInTableCollection – Get-TablesCollection will return an array of all the tables in a database. Often there is a need to only work with a subset of that table collection. These two functions will filter based on the schema and return a new array. The first, Remove-SchemasFromTableCollection, removes all tables from the array of schemas that are passed in. The second, Select-SchemasInTableCollection, will retain only those tables in the schemas passed into the function.

Remove-TablesFromTableCollection and Select-TablesInTableCollection – These work as filters, similar to the functions above. Instead of the schema however, they are based on table name. All tables that begin with the text passed in are either removed, or in the latter function the only ones retained.

Get-PrimaryKeyIndex – Returns the primary key object for the passed in table object.

Get-PrimaryKeyColumnNames – returns a comma delimited list of the column names in the primary key

Decode-IsPrimaryKeyColumn – Will determine if the passed in column name is part of the primary key index

Get-TruncateStatement – Will generate a SQL Truncate Table statement based on the table object passed in.

Get-DropTableStatement – Generates a Drop Table statement, including the check to see if the table exists, for the passed in table object.

Get-CreateStatement – To simply say this function generates a create table statement would do it disservice. It will take a table object and reverse engineer it, generating a create table statement. Unlike other code generators, it has a suite of parameters which allow customization of the generated statement with an eye toward the needs of a data warehouse developer. A few are:

DataTypeAlignColumn – Set the column number to line up the data type declarations on. Passing in a value of 1, will suppress alignment and simply place the data type after the column name. The default is column 50.

OverrideSchema – It is common place staging tables in the data warehouse in their own schema, often named ‘Staging’ or ‘ETL’. Passing in a value here will include the new schema name in the create table declaration. If the table object passed in had a schema other than dbo, it is placed in front of the table name with an underscore. If it was dbo, the source schema is simply omitted.

PrependToTableName and AppendToTableName – Allows extra text to be placed before or after the table name. For example, it is common to create tables with _Delete, _Update, and _Insert in the staging area. This provides a simple way to do that.

AdditionalColumns – When creating tables in a data warehousing environment, there are often extra columns to hold metadata about the ETL process. A user of this function can create an array of additional columns using the Add-ColumnDefinition function and have them added to the create table statement.

Scrub – This is a very powerful switch. When added it will perform a cleanup to make the output suitible for data warehousing. Columns with large data types such as VARCHAR(MAX) are removed. All single byte character sets in the source are converted to double byte sets.

SuppressIdentity – Source systems will sometimes use the IDENTITY clause in the primary key column. Using this switch will suppress that identity clause from being generated in the new create table statement.

SuppressNotNull – Often staging tables will not be concerned with null versus not null values. Using this switch will create all columns as nullible, regardless of their setting in the source.

IncludeDropTable – Adding this switch will include a ‘if exists drop table’ style clause prior to the create table statement.

PrimaryKeysOnly – Will generate a create table statement that only has the primary keys found in the source system.

Finally, if a column in the source table object has a custom data type, the script will reverse engineer the data type back to its basic SQL data type.

Get-SelectStatement – Like its sister function Get-CreateStatement, under the covers this function provides a lot of power and flexibility to the statement it creates. Additional columns can be added, columns can be specified to order the output by, table aliases can be used, and most powerful of all is the ability to generate a HASHBYTES column, including the ability to remove specified columns from the hash byte calculation. Here are some of its parameters:

AsColumn – The routine will line up the AS <column alias> at the column number passed in here. The default is 50. To not use aligning, set this to 1.

PrependToColumnName – Text to include before each column name.

AppendToColumnName – Text to place after each column name.

AdditionalColumns – A collection of additional columns to be added to the SELECT statement. Useful for adding metadata columns. All items in the AdditionalColumns collection should be generated using the Add-OutputColumn function.

OrderByColumns – A list of columns to add to the ORDER BY clause. All items in the OrderByColumns collection should be generated using the Add-OutputColumn function.

TableAlias – Allows user to specify an alias to use for the table. The alias is then put in front of each column name.

HashBytes – If included the select statement will include a HASHBYTES function with all columns except the primary keys and any columns included in the OmitFromHashBytes collection. The name passed in this parameter will be used for the name of the HashBytes column.

OmitFromHashBytes – A collection of column names that should be excluded from the HashBytes calculated column. Useful for excluding metadata columns. All items in the OmitFromHashBytes collection should be generated using the Add-OutputColumn function.

Scrub – When included this will remove certain data columns from the output, such as BINARY, NVARCHAR(MAX), XML, and other large types not normally used in data warehouses. Additionally VARCHAR/CHAR are converted to NVARCHAR/NCHAR, and DATETIME converted to DATETIME2(4).

Flatten – When included will return the SELECT statement as one long string, without any Carriage Return / Line Feed characters. Additionally, any additional spacing (such as indicated with the AsColumn) is eliminated.

IncludeOrderByPK – When included the Primary Keys in the table object are included in the order by clause. If any columns are passed in the OrderByColumns parameter, the Primary Keys occur first in the Order By clause, followed by any columns in the OrderByColumns parameter.

IncludeNoLock – When included, a WITH NOLOCK clause is added to the SELECT statement.

Construction

Those PowerShell experts who review the code may note that in many places code does not follow the most "powershelly" way of doing things. In some places rather than using pipelining it was instead decided to use a foreach loop, for example. The intended audience for this module are T-SQL developers who may not be as comfortable in PowerShell as they are T-SQL. Thus using code that more closely aligned with T-SQL patterns would make it more useful and modifiable by SQL developers.

When development first started attempts were made to use advanced functions, using the pipeline for input and output. At some point however this didn’t make sense for a majority of the functions. Time constraints further impinged this effort. Some future revision may attempt to migrate selected functions back to an advanced design, but for now they will have to stand as is.

Development Environment

This module is intended to be used on a developer workstation, not on a server, and especially not on a production server. As such deployment has been made simple, just copy the Arcane-SQL folder to the developers PowerShell module library. On a standard Windows 7 machine this would be C:\Users\<<usernamehere>>Documents\WindowsPowerShell\Modules. If the Windows PowerShell folder and modules subfolder do not exist, they will need to be created first.

To keep things simple, no attempt was made to sign the script. If this is an issue the developer using this module can self sign it on their PC. Check the execution policy on the workstation where the module is installed to ensure sufficient rights to run the module.

This module was developed on machines with both SQL Server (Developer Edition) 2008R2 and 2012 installed (some machines with both) and worked without issue. One machine it was tested on had 3 versions of SQL Server, 2008R2, 2012, and 2014. On that one machine there were some errors with some of the functions passing in the SMO table objects. Those are still being investigated.

SQL Security was assumed to be handled using built in Windows Credentials. Thus the logged in user would need to have rights based on Windows credentials to the SQL Server they are targeting.

The machine being developed on was using PowerShell v4, however v3 should work as well.

This module was developed using SAPIEN PowerShell Studio 2014. To make life easy for other developers the PowerShell Studio files (Arcane-SQL.psproj, Arcane-SQL.psproj.build, and Arcane-SQL.psprojs) were included in the code. If you are using a different editor, such as the PowerShell ISE, simply discard these files.

Warranty

To put it succinctly, there is none. No guarantee is made for the code in this module, users of this module assume all risks. While I am happy to receive bug reports, I make no promises or guarantees if, or when, they will be fixed.

Contributions

No, not the money kind, code contributions. If anyone wishes to extend the functionality of this module I am happy to collaborate as long as the coding standards demonstrated in this module are adhered to, and the contributions are relevant to the goals of this module. Be aware though this is not a money making effort, so expect no monetary reimbursement for any contributions.

Not dumb, good questions which remind me I need to include a good use case story along with the technical documentation. What I get for late night blogging.

First scenerio:

I am a BI guy who needs to generate many staging tables which look similar to, but not quite like the source system tables. However the differences are common and predictable.

Source systems are setup so it is very difficult, if not impossible, to be on the BI server and execute t-sql to read structures on a source server then create the table back on the BI server. (Unfortunately I do encounter these).

Using the Generate-CreateStatement I can read structures from the source server and apply common transformations to make the newly created table suitible for a staging table. I could then use the Invoke-SqlCmd to generate the table on the BI server.

Second scenrio:

When loading dimensions a common SSIS pattern, when you lack the ability to detect row changes in the source, is to have two data sources in the data flow: the staging tables and the target dimensions. You then use a HASHBYTES to compare the two to see if any changes have been made. The Generate-SelectStatement can create both of the needed SQL statements for you, including the often tedious and error prone HASHBYTES calculation.

True, many BI people don’t know PowerShell. But I am meeting more and more who do, when I do PowerShell presentations they are always packed. Additionally Microsoft is positioning PowerShell across all their products to the point where it will be essential for all IT Pros to know it.

Finally this module will work hand in hand with another module I am creating which will generate SSIS packages. I am hoping to complete this within the next month. the partially completed version combined with the Arcane-SQL module let us generate more than 10,000 tables and over 2,500 SSIS packages in just a couple of hours.