Wednesday, 26 July 2017

Case
I want to use the users from my Active Directory in my Data Warehouse for an Employee Dimension. How can I accomplish that in SSIS?

Solution
With a little .Net scripting in a Script Component you can accomplish an Active Directory source in your Data Flow Task. Before you start you need to figure out which fields are available in your Active Directory. If you don't know the fieldnames you could set a breakpoint on the foreach loop and add a watch on the result variable. Then you could browse through the properties of that variable to find all available fields.1) Script Component Source
Create a Data Flow Task and add a Script Component and choose "Source" as the type. After that give you source a suitable name like "SRC_SCR - Active Directory Users".

Script Component - Source

2) Output columns
Next edit the Script Component and go to the Inputs and Outputs pane. This is where we need to specify all the new output columns and datatypes. For my Active Directory example I have added 5 string (DT_STR) columns: Fullname, Firstname, Surename, Department and Manager. You have to figure out the required length. You could start with the default 50 or change it to a more save 255.

Add new output columns

3) Edit script
Go back to the first page and choose C# as your scripting language and then hit the Edit Script buton to open the vsta environment.

Edit script

4) Reference
To tell our script about Active Directory, we first need to add a new reference to the .NET assembly System.DirectoryServices. In the Solution Explorer right click References and
choose Add Reference... Then scroll down, select System.DirectoryServices and click OK to confirm.

Add reference

5) The Script - Namespaces
Under namespaces add a new using for our newly referenced assembly: using System.DirectoryServices; This allows you to shorten the code.

Add using

6) The Script - CreateNewOutputRows
Now replace your CreateNewOutputRows method with the one below and add the GetPropertyValue method from below within your class tags under the existing methods. You could remove the PostExecute and PreExecute methods since we don't need them in this example.

The CreateNewOutputRows method adds new rows to the buffer by calling Output0Buffer.AddRow(); within a loop construction and then it fills all columns with data from the Active Directory. Before the foreach loop you find the code to connect to the Active Directory with some optional filter.

7) The result
To test the result add a dummy Derived Column and a Data Viewer to see the values. If it works you can add the rest of the components to the Data Flow Task to fill your dimension table.

The result of my script

Summary
A very simple and short script to get data from your Active Directory. For more filter examples visitMSDN. For very large Active Directories you have to play with the PageSize property.
Note that there are also scriptless options like with an ADO.Net source or with Third Party components. Each solution has its own pros and cons.

Sunday, 25 June 2017

Last week I had the honor to speak for the PowerShell Virtual Group about the combination of SSIS and PowerShell. As promised, here is my PowerPoint of that session and a list of all my PowerShell scripts for SSIS:

And on my other blog I have a couple of PowerShell scripts available for Azure which can be executed as runbooks in Azure Automation. And if you have an hour available you can watch the entire session on youtube.

Thursday, 1 June 2017

Case
I am filling an Object variable with an Execute SQL Task and I want to use it in a Foreach Loop Container (Foreach ADO Enumerator), but the Foreach Loop stays empty. So I want to check the value of my Object variable. However debugging the package does not show me the value of Object variables. How can I see the content of my Object variable?

No (readable) value for Object variables

Solution
A solution could be to use a Script Task after the Execute SQL Task to show the content of the Object variable. The script below shows the top (x) records in a MessageBox. The code doesn't need any changes. The only change that you could consider to make is changing the number of records to show in the MessageBox (see C# variable maxRows).

Getting content of Object variable

1) Add a Script Script Task
Add a new Script Task to the surface of your Control Flow and connect it to your Execute SQL Task. Then edit the Script Task to provide one Object variable in the property ReadOnlyVariables or ReadWriteVariables. This should of course be the same Object variable as in your Execute SQL Task.

Provide one Object variable

2) Edit Script
Make sure to select Microsoft Visual C# as Script Langugage and then hit the Edit Script button to open the Vsta environment. Then first locate the Namesspaces to add an using for System.Data.OleDb.

#region Namespaces
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;
using System.Data.OleDb; // Added
#endregion

Then scroll down and located the Main method "public void Main()" and replace it with the code below.

Now close the Vsta environment and click on OK in the Script Task editor to finish it.

3) The result
Now run the package to see the result. I tried to make it a bit monkey proof by adding some checks in the code. If you provide a good and filled variable then it will show the data. Otherwise it will show an error telling you what's wrong.

Sunday, 14 May 2017

Case
I want to import and export Environments to/from my SSIS Catalog. Doing it manually in SSMS takes ages. How can you do that more quickly?

I want to export this environment

Solution
I will ago I created a couple of PowerShell scripts to deploy environments to your SSIS Catalog with a CSV file, database table or an array as source. The script below is a follow up that allows you to export one or more environments as json files, but also has an import method to deploy those exported environments to a Catalog: Get-CatalogEnvironment and Set-CatalogEnvironment.

This is an example of how you execute the two methods. It first starts with importing a separate script file with various methods and then you can either execute the Get or the Set method:

Later on I will add various extra methods for example to test the existence of an environment, to delete an environment, to move an environment, to copy an environment, to rename an environment or to connect an environment to a project. Please let me know if you have any suggestions for extra functionality or improvements!

Thursday, 4 May 2017

Nordic SQL Nexus 2017
Last week I had the honor to speak at SQL Nexus in Copenhagen, a marvelous three day Microsoft Data Platform event with over 70 session to choose from. You can download my PowerPoint presentation here. Scripts or links to scripts are available in the notes of the PowerPoint. Contact me if you need have any question related to this presentation. More PowerShell scripts are available here.

Wednesday, 5 April 2017

Case
For a client I need to read hundreds of bus route matrices and they all vary in size. This makes it hard to read them dynamically with a Foreach Loop Container because the number of columns differs per file. And I don't want to create hundreds of Data Flow Tasks by hand. Even BIML won't help this time, because the routes change regularly and I don't want to generate and deploy packages every day.
I need to dynamically unpivot data within the Data Flow Task. How do I solve this within SSIS?

Dynamically unpivot data

Solution
The trick for this case is to read everything as one big column and then dynamically split and unpivot the column in a Script Component Transformation. The unpivot output will always have three columns: Start Station, End Station and Distance. And the good news is that it has only a few lines of relatively easy code.

The solution

1) Source with one big column
Change your Flat File Connection Manager so that it will read everything as one big column. Make sure the column is big enough to fit all data. For this example I called the column 'ColumnOne'.

Flat File with one column only

2) Script Component Transformation Input
Drag a Script Component on the surface and choose Transformation. Connect it to your source. Then edit the Script Component and go to the 'Input Columns' page. On that page select the column with all the matrix data as ReadOnly.

Input Columns

3) Script Component Transformation Input
On the 'Inputs and Outputs' page we need to add the new output columns. For this example I need a StartStation (string), EndStation (string) and the Distance (int).
An other important step is setting the SynchronousInputID property (of Output 0) to 'None'. This makes the transformation asynchronous which means the number of row in could be unequal to the number of rows out. And that means the input buffer with records isn't reused in this component, but a new output buffer will be created.

Inputs and Outputs

4) The script

Go to the script page, choose C# as scripting language and hit the Edit Script button. And now copy the contents of my Input0_ProcessInputRow method to your Input0_ProcessInputRow method. And there are also two variables called Stations and Distances. They are declared above this method. Copy those to your code and put them on the same place.

I also remove the unused methods PreExecute, PostExecute and CreateNewOutputRows to keep the code clean and mean.

#C# Code
#region Namespaces
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
#endregion
/// <summary>
/// Split and unpivot data
/// </summary>
[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
// Define two arrays for distances and stations
// The Stations array will be filled only once
// The Distances array will change for each row
string[] Stations;
string[] Distances;
/// <summary>
/// This method is called once for every row that passes through the component from Input0.
/// </summary>
/// <param name="Row">The row that is currently passing through the component</param>
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
// The first time this method executes the Stations array
// is still empty (null). In the true clause of the if-
// statement we will fill the Stations array.
// Therefore the second, third, etc. time this method
// executes we will go to the false clause of the if-
// statement.
if (Stations == null)
{
// We know that the first row contains the stations.
// We will add those to the stations array and use
// it to determine the end station later on.
// Split the string from ColumnOne on ; (or your own
// column separator). The Split returns an array.
Stations = Row.ColumnOne.Split(';');
}
else
{
// Now the rows will contain distances (and the StartStation)
// Split the distances on ; (or your own column separator)
Distances = Row.ColumnOne.Split(';');
// Now loop through distances array, but start on 1 (not on 0)
// because 0 contains the StartStation in the distances array
for (int counter = 1; counter < Distances.Length; counter++)
{
// Add new Row and then fill the columns
Output0Buffer.AddRow();
// Get the Distance from the Distance array and convert it to int
Output0Buffer.Distance = Convert.ToInt32(Distances[counter]);
// Get the Start station from the distance array (the first item)
Output0Buffer.StartStation = Distances[0];
// Get the End station from stations array
Output0Buffer.EndStation = Stations[counter];
}
}
}
}

4) The result
Now close the Script Component and add more transformations or a destination and see what the Script Component does with your data. I added a dummy Derived Column and Data Viewer to see the data before and after the Script Component. For this file I had 27 rows and columns as input and 676 rows as output (26 * 26).

Saturday, 25 February 2017

Case
I created dozens of packages in my project but I forgot to change the default Protection Level in the project properties from "EncryptSensitiveWithUserKey" to "DontSaveSensitive". Now I have to change all packages one by one. Is there an alternative? I tried search and replace in the XML, but I can't find the Protection Level property.

Solution
Of course the best option is to prevent this from happening by setting the default before you start. You can do this in the properties of the project. All new packages will then inherit the Protection Level from the project.

Setting Protection Level on project

First, when trying to search and replace in the XML code of the packages you will notice that you cannot find the default 'EncryptSensitiveWithUserKey' which makes it hard to replace.

Default Protection Level is not in package

Secondly, the Protection Level is also stored in the Visual Studio project file (*.dtproj). When you open a package in design mode and press the save button it also updates metadata in the project file.

Protection Level in project file as well

Solution A
Good old Command Prompt to the rescue! The dtutil Utility can do the package conversion for you. If you are afraid of the Command Prompt or even never heard about it, then don't use this solution.

1) Command Prompt
Open a Command Prompt and use CD (Change Directory) command to navigate to your folder with packages.

Navigate to your project folder with packages

2) Foreach Loop Container in DOS
Now you can call the dtutil Utility for each package in that folder with something similar as a Foreach Loop Container:FOR %p IN (*.dtsx) DO dtutil.exe /file %p /encrypt file;%p;0 /quiet

The colors explain the command

3) Execute
When you execute the command, dtutil Utility will quickly change the Protection Level of all your packages.

101 packages changed within 5 seconds. Try that in Visual Studio!

4) Project Protection Level
If you haven't already done it, change the Protection Level in the Project Properties. See second screenshot of this blog post.5) dtproj file
Now the project and all its packages have the same Protection Level, but the project doesn't now that yet. If you try to execute a package it will complain about the Protection Level inconsistencies.

Failed to execute the package or element. Build errors were encountered.

Error : Project consistency check failed. The following inconsistencies were detected:
MyPackage000.dtsx has a different ProtectionLevel than the project.
MyPackage001.dtsx has a different ProtectionLevel than the project.

To update the dtproj file you have to open all packages and then Rebuild the project. This will update the project file. Now you can execute the packages without the consistency error.

Open all packages and rebuild the project

Solution B
Good old PowerShell to the rescue! This PowerShell script does the same as above, but also changes the project file. So no manual labour at all. Because the dtutil utility was so fast, I didn't edit the packages with .net libraries. It just executes dtutil in a hidden window.

The script is thoroughly tested for SSIS 2012-2016 from 'EncryptSensitiveWithUserKey' to 'DontSaveSensitive'. Other situations require more testing. Make sure to keep a copy of your project before using this script and let me know which situations require some more attention.

Wednesday, 1 February 2017

Case
I have a large XML file with Orders and Onderlines which I want to (merge) join into a new destination. To join the two outputs I need to order them, but the sort transformation takes too much time. Is there a faster alternative?

XML Source with two joined outputs

Solution
The solution is surprisingly quite simple: the outputs are already sorted and you only have to tell SSIS that (Similar to a source with order by in the query).

For XML files with multiple levels (first for orders and second for orderlines) like below, SSIS will create two output ports.

XML Sample

The outputs will have an extra bigint column which allows you to connect the orderlines to the correct order.

Two outputs with additional ID column

Instead of using these ID columns in the SORT transformations, you can also use the advanced editor of the XML source to tell SSIS that these columns are already sorted. Right click the XML source and choose 'Show Advanced Editor...'.

Show Advanced Editor...

Then go to the last page 'Input and Output Property' and select the Orderline output. In the properties of this output you can tell SSIS that the output is sorted.

Set IsSorted to true

Next expand OrderLine and then Output Columns and click on the additional ID column 'Order_Id'. In its properties locate the SortKeyPosition and change it from 0 to 1.

Set SortKeyPosition to 1

Repeat this for the second output called 'Order' and then close the advanced editor. If you still have the SORT transformations, you will notice the yellow triangle with the exclamation mark in it. It tells you that the data is already sorted and that you can remove the SORT transformations.
And if you edit the Data Flow Path and view the metadata you will see that the column is now sorted.

Sorted! Remove the SORT transformations!

Conclusion
The solution is very simple and perhaps this should have been the default sort key position anyway? It smells like a bug to me...

Wednesday, 4 January 2017

Case
I have a CSV file with numeric values that use a dot "." as decimal separator instead of the comma "," we use locally. When I try to import it in SSIS with a Flat File Source it gives me an error. I don't want to/can't change the regional settings on the server. How do I import this flat file without errors?

The value could not be converted because of a potential loss of data10.5 should be 10,5 (or vice versa)

Error: 0xC02020A1 at DFT - Process Data, FF_SRC - myCsvFile [2]: Data conversion failed. The data conversion for column "myColumn" returned status value 2 and status text "The value could not be converted because of a potential loss of data.".Error: 0xC0209029 at DFT - Process Data, FF_SRC - myCsvFile [2]: SSIS Error Code DTS_E_INDUCEDTRANSFORMFAILUREONERROR. The "FF_SRC - myCsvFile.Outputs[Flat File Source Output].Columns[myColumn]" failed because error code 0xC0209084 occurred, and the error row disposition on "FF_SRC - myCsvFile.Outputs[Flat File Source Output].Columns[myColumn]" specifies failure on error. An error occurred on the specified object of the specified component. There may be error messages posted before this with more information about the failure.Error: 0xC0202092 at DFT - Process Data, FF_SRC - myCsvFile [2]: An error occurred while processing file "D:\myFolder\2016-12-27.csv" on data row 2.Error: 0xC0047038 at DFT - Process Data, SSIS.Pipeline: SSIS Error Code DTS_E_PRIMEOUTPUTFAILED. The PrimeOutput method on FF_SRC - myCsvFile returned error code 0xC0202092. The component returned a failure code when the pipeline engine called PrimeOutput(). The meaning of the failure code is defined by the component, but the error is fatal and the pipeline stopped executing. There may be error messages posted before this with more information about the failure.

Solution
You can change the LocaleID of the connection manager to import this file. Right click the connection managers and choose Properties...

Go to properties of flat file connection manager

Then locate the LocaleID property and change it to English (United States), English (United Kingdom) or an other country that uses a dot "." as decimal separator. Or change it to for example Dutch (Netherlands) if you have the opposite problem.