Tuesday, 17 July 2012

Recently I did a post on how to loop through all sheets in an Excel file. For those who don't like scripting and for those who think the Foreach ADO.NET Schema Rowset solution is ugly, I created a custom enumerator. Same result, but with an easier interface and a possibility to filter with both wildcards and regular expressions.

Foreach Excel Worksheet Enumerator

Disclaimer
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Downloads:
You can download a SSIS 2008 and 2012 version on the download page.

Installation
The installer registers the DLL in the GAC and copies it to the ForEachEnumerators folder of SSIS (example: C:\Program Files\Microsoft SQL Server\100\DTS\ForEachEnumerators\). After that you have to close Visual Studio/BIDS because it caches the GAC on startup.

SSIS
The enumerator is automatically added to the Foreach Loop when you open a SSIS project.

Tuesday, 10 July 2012

Case
A while ago I did a post on how to get SSIS 2008 working with Team Foundation Server (TFS). But beware, you shouldn't work with two developers on one package at the same time! That will cause problems with for example lineage id's. Merging two versions will be nearly impossible. So how do you prevent that?

SolutionGo to the Team Explorer pane in Visual Studio (BIDS). Right click on the solution and select Team Project Settings and then Source Control. Now you can uncheck the checkbox for Enable multiple check-out.

Thursday, 5 July 2012

Case
I have an Excel file with multiple identical worksheets (one foreach month) and I want to add the data to a single database table. Is there a foreach loop solution so that I don't need to add multiple data flows or sources.?

Solution
a) You could use a union all query in the Excel Source. Not very flexible, but very easy.

b) You could loop through Excel tables by using the Foreach ADO.NET Schema Rowset enumerator. One downside: it returns both worksheets (which have the $ suffix) and named ranges. So you need an extra dummy task in your foreach loop with an expression on the precedence constraint to the next task. Something like: RIGHT(@[User::WorksheetName], 1) == "$"
c) You could use a Script Task to fill a SSIS object variable and use that to loop through. Let's elaborate that solution.

1) Excel Connection Manager and Variables
Add an Excel Connection manager with a link to an Excel File with multiple Worksheets. I named mine "MyExcelFile". If you choose an other name, then make sure to also change that in the script task of step 3.

And we also need two SSIS variables. One object variable (ExcelWorksheets) to loop through in the foreach loop and one string variable (WorksheetName) to be filled by the foreach loop.

Two variables

2) Script Task
Add a Script Task to your Control Flow and give it a suitable name. Edit the Script Task and add the SSIS Object variable "ExcelWorksheets" from step 1 as ReadWrite variable.

ReadWriteVariables

3) The script
Edit the Script Task (open VSTA editor) and copy the following script.

' VB.Net code
Imports System
Imports System.Data
Imports System.Data.OleDb ' Added
Imports System.Math
Imports Microsoft.SqlServer.Dts.Runtime
<System.AddIn.AddIn("ScriptMain", Version:="1.0", Publisher:="", Description:="" > _
<System.CLSCompliantAttribute(False)> _
Partial Public Class ScriptMain
Inherits Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
Enum ScriptResults
Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success
Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
End Enum
Public Sub Main()
Try
' Get connectionstring from Excel Connection Manager and use it to connect through OLE DB
Dim excelConnection As New OleDbConnection(Dts.Connections("MyExcelFile").ConnectionString)
excelConnection.Open()
' Once connected, get the table schema and close the OLE DB connection
Dim dtDatasetsInExcel As DataTable = excelConnection.GetSchema("Tables")
excelConnection.Close()
' NOTE: The datatable from GetSchema includes both worksheets (which have the $ suffix)
' and named ranges. So we need to exclude those named ranges.
' Create a dataset.
Dim dsWorksheetsInExcel As New DataSet()
' Create a new table in the dataset
Dim dtWorksheetsInExcel As DataTable = dsWorksheetsInExcel.Tables.Add()
dtWorksheetsInExcel.Columns.Add("WorksheetName", GetType(String))
' Loop through all tables and only get those ending with a $
For Each drWorksheet As DataRow In dtDatasetsInExcel.Rows
' Check for $ suffix
If drWorksheet("TABLE_NAME").ToString().EndsWith("$") Then
dtWorksheetsInExcel.Rows.Add(drWorksheet("TABLE_NAME").ToString())
End If
Next
' Fire information event with the total number of worksheets
Dim fireAgain As Boolean = True
Dts.Events.FireInformation(-1, "Foreach Worksheet", dtWorksheetsInExcel.Rows.Count.ToString() & " worksheets found.", String.Empty, 0, fireAgain)
' Fill SSIS Object variable with worksheet dataset.
Dts.Variables("User::ExcelWorksheets").Value = dsWorksheetsInExcel
' Success
Dts.TaskResult = ScriptResults.Success
Catch ex As Exception
' Something went wrong. Log error and fail Script Task
Dts.Events.FireError(-1, "Foreach Worksheet", ex.Message, String.Empty, 0)
Dts.TaskResult = ScriptResults.Failure
End Try
End Sub
End Class

Worksheets and named ranges

4) Foreach Loop
Add a Foreach Loop Container to the Control Flow and give it a suitable name. Connect it to the Script Task from step 2.

Foreach Loop

5) Foreach ADO enumerator
Edit the Foreach Loop Container and select the ADO enumerator in the collection tab. After that select the SSIS object variable from step 1 as ADO object source variable and select Rows in first table as the enumeration mode.

Foreach ADO enumerator

6) Variable mappings
In the Variable mappings tab select the string variable from step 1. Now the Foreach Loop will fill this variable with the currect Worksheet name.

Variable mappings

7) Excel Source
Add a Data Flow Task in the foreach loop and add a Excel Source to that Data Flow Task. Select the Excel Connection manager created in step 1 and select Table name or view name variable. Now you can select the variable that contains the Worksheet name.

Excel Source using variable for table name

Note 1: hidden worksheets are not returned by the OLE DB provider.
Note 2: Worksheets are always sorted ascending on their name
Note 3: If you have different data structures, but worksheets all have some common columns, then you could create a query in a variable and use that as a source query. Something like "SELECT column1, column4 FROM " + @[User::WorksheetName]