Friday, 31 December 2010

Case
A few month ago I had a job to add some adjustments to a SSIS project, but they lost the Visual Studio project file. So I created a empty project and started adding all the packages (over a 150) from SQL Server to the project one by one. After five packages I realised this could take ages. A while ago I created a package to upload all Visual Studio project files to SQL Server and thought I can do that vica versa!

Solution
I created a package with a single Script Task and used the SSIS API to download all packages to the Visual Studio SSIS project folder. After that I edited the Visual Studio SSIS project file (.dtproj, an XML file) to add the packages to the project itself.

Thursday, 30 December 2010

CaseEvery now and then you need a variable in a script task. For instance to avoid a hardcoded path or connectionstring in your code. You can use Package Configuration to fill a variable and use that variable in your script task.

Solution
There are two ways to read and write variables in a Script task. I will show you both.

You can even add a Try Catch Finally construnction to avoid unexpected errors (variable doesn't exists or is already locked). And with the finally you can release the locks even if your script fails, so you can still use the variable in for example an event handler.

Which version is best? The result of both version is the same. The advanced version gives you a little more control over what is happening and at what stage the variables are locked and released, but the simple method is used more often in blogs and forums. I recommend using the simple method.

Note: Variables in a Script Component will be discussed at a later time.
Update: I wrote an SSIS WIKI about this where I also show how to use parameters in a Script Task

Wednesday, 29 December 2010

Case
Occasionally you need to pass some value from the parent package to the child package. Of cource you could temporary store it in a database or file, but that could be a little too much for a single value.

Solution
The easiest solution is to store the value in a variable in the parent package and use a script task in the child package to copy it.

1) Create parent variable
Create a variable in your parent package. Right click in the Control Flow if the variables are not visible. I used a string variable named FilePath in this test case to store some filepath.

3) Create child variable
Goto to your child package and create a variable, but be sure not to use the same name as in the parent package. Otherwise it won't work! I used LocalFilePath in this case.

4) Script task
Drag a Script Task in the Control Flow of your Child package. Choose your script language (I will show both C# and VB.net for this short script) and select your Child package variable as a ReadWriteVariable and type (you can't select it) your Parent package variable as ReadOnlyVariable. The Scope (User::) is optional.

Language and Variables

5) The Script
The simple version of this script is only 1 row of code. I will show the more advanced version at a later time.

Note: the child package cannot run by itself, because it will fail finding the parent variable. A simple solution is to change the Constraint behind the Script task from Success to Completion (see blue line in last picture).

Tuesday, 28 December 2010

Case
Constantly running all Microsoft SQL Services on your development desktop/laptop could be a little too much. Especially when you're not developing, but starting and stopping via the control panel is tiresome...

Solution
You can stop/start a service via the command (or even better, put it in a batchfile):

Execute Package
ReferenceType has been added to the Execute Package Task. Which you can use to select a package from your current project. If you choose External Reference, you will get the SQL Server / File system choice.

Package Reference Type

And there also is a new parameter binding tab to support the new parameter option (see parameters).

DataflowFlexible Authoring
It's now possible to edit a component even when its input path is disconnected.

Source & Destination Assistant
New in the dataflow are the Source Assistant and Destination Assistant. Feels a little bit like SSIS for Dummies... and there isn't a flatfile target.

They are extra.

After the OK, you will get the regular screens.

Grouping in dataflow
Very useful for large dataflows is the new collapsible grouping option.

Dataflow grouping

Data Correction
Very cool new data flow object, but not yet working. I'm very curious about this new item...

Pivot
I expected a better editor. Unfortunately no improvements yet...

Simplified Data Viewer
The Data Viewer can now be configured directly from the Data Flow Path editor. The data viewer only supports a grid view, the histogram and scatter plot views have been removed (but who has ever used them?).

Case
A client wants to export data to XML files, but SSIS has no XML destination. A script component could help him but he doesn't want to create a new script for each file/source just because the file/source lay-out is different.

Solution
There is a request for an XML destination on the Microsoft site which you can support.
There is an open source component: XML Destination and there are some third party components, but you can do it yourself with a Script component.
Let's create a flexible XML destination that is the same for each file/source. This could be resolved by reflection (again).

1) Source
Create a random source (flatfile / OLE DB).

2) Script component
Drag a script component to your data flow and select Destination. Connect it to the Source component.

Script component destination

3) Input columns
Edit the Script components and select all input columns (or just the ones you need)

Select all columns as input

4) Destination
Create a new connection named xmldocument in the Connection Managers tab. The file type should be FILE and create a new file.

FILE

Create file

5) The Script
Now the script (C# and VB.net for this example). SSIS creates 3 methods which we will adjust and we need a fourth method to remove forbidden xml characters such as < and >.

Sunday, 26 December 2010

Case
Finally had some time to install Denali (on my Windows 7 machine with SQL 2008 already running). Unfortunately I found out that after the installation of SQL 2012, I couldn't edit my existing SSIS 2008 projects and I couldn't create a new one. Other project types didn't have any problems but SSIS did:﻿

Aargh!

SolutionMax Trinidad found a solution to get SSIS 2008 working again. He had the same problem with SSIS 2008 R2.Short version:1) Remove SQL 20122) Remove BIDS 20083) Restart4) Add BIDS 20085) Install SQL 2012 on a different PC or VPC!

Case
I want to upload all SSIS packages in my local Visual Studio Project folder to the SQL development server at once.

Solution
You can use the SSIS api for that!

1) Foreach Loop
At a Foreach Loop Container and give it a suitable name:

Foreach Loop Container

2) File Enumerator
Use a Foreach File Enumerator to loop through the (bin) folder of your project. (Unfortunately you can't get the folder name of a package, so you will have enter the path in the foreach loop manually.) Select only the *.dtsx files and use Fully qualified to retrieve the filename.

All packages from your VS project

3) Variable
Go to the Variable Mapping tab and select <New Variable...>. Create a new String variable with the name localPackageFile. After creating the new variable the Index is set to the default 0.

New String variable to store the path

4) Script Task
Drag a Script task into the Foreach loop and give it a suitable name.

Script task

5) Pass through variable
Select the new create String variable localPackageFile as a ReadOnlyVariable on the first tab of the Script task.

Select the variable

6) The script
We will use C# in this example. Hit the Edit Script button and edit the Main method. This script will upload (and overwrite) all packages.

Saturday, 25 December 2010

Case
Recently I explained how create an infinite loop with a WMI Event Watcher Task in it to watch for new files throughout the day. That solution had one small disadvantage: Not only the loop runs infinite, but the package does too... and some people don't want that. They want to end the package at the end of the day and restart it the next day.

Previous solution

Solution
We will stop the package just before midnight and restart it again at midnight.

1) Add timeout to File Watcher
You can't interrupt the loop and end the package because the WMI Event Watcher Task is still watching out for new files. It will only stop if you drop a file in the folder, but you can also add a time-out to continue the control flow. Let's add a 14 minute time-out (840seconds) and let the control flow continue without errors:

Set ActionAtTimeout to Log the time-out

Set AfterTimeout to Return with success

Set Timeout to 840 seconds

Timeout and continue

2) Change loop EvalExpression
After the timeout from the WMI Event Watcher task, the Foreach loop won't find any files. So the loop restarts.
To stop looping just before midnight, you have to change the EvelExpression of the loop to:
GETDATE() < DATEADD("Mi",45,DATEADD("Hh",23,(DT_DBDATE)GETDATE()))

Continue while GetDate is smaller than today at 23:45

Note: You can finetune the timeout and the endtime of the loop to create a smaller window.

Friday, 24 December 2010

Case
As an external employee I see a lot of SSIS packages at various companies made by a whole bunch of different people. Unfortunately some of those people made Quick & Dirty as a motto in life resulting in hard to read packages. And that's a waste of time for the companies.

Solution
Companies should require both well performing and well documented packages. Here is a list of some basic development Best Practices to achieve clear and manageable packages.

1) No default names and descriptions
Rename all default component names and give them explaining descriptions. This will help other developers that edit your packages. It is also very useful when debugging.

No default names and descriptions

2) Annotations
Use annotations. This is very useful if the Control Flow or Data Flow isn't self describing (for others).

Use annotations

3 Group logical work
Use Sequence containers to organize package structures into logical units of work. This makes it easier to identify what the package does. It also helps to control transactions if they are being implemented. * Update: SSIS 2012 has a grouping feature *

Use Sequence Containers

4 Flow directions
Flows should basically go top-down. This will make your packages more readable.

Design your package Top down

You can use the Auto-format option from SSIS to format your packages

Auto Layout is a good start

5) Disabled Control Flow tasks
Do not use disabled Control Flow tasks in the Quality assurance or Production environment. If you want to conditionally execute a task at runtime use expressions on your precedence constraints. Do not use an expression on the “Disable” property of the task.

Disabled Control Flow Task

6) Spread large number of packages over serveral Visual Studio Project
You can add more than one projects to your Visual Studio Solution to spread large number of packages. Think aboutaproperlayout. For example a datastaging project and a datawarehouse project.

7) Queries in source and look up components
Don't use too complex queries. Use a readable lay-out and add comments to explain parts of the query. For example:

-- This query does something
SELECT a.field1
, a.field2
, b.field3
, b.field4
FROM table1 as a
LEFT JOIN table2 as b
on a.field5 = b.field6
WHERE a.field2 = 'x' -- Comment about x
ORDER BY a.field1

8) Script Coding Conventions
Use condings conventions when scripting a script task or component. C# and VB.Net both have their own conventions which are widely available on the net.

9) Use naming conventions
Give tasks and transformations a prefix. This makes it easier to read the logging.

10) Use templates
You can create templates for SSIS. Things like logging, configurations and connection managers can be added to these templates.

Let me known if you have items that should be in the list of Development Best Practices!

Thursday, 23 December 2010

Case
Microsoft finally added sorting in Managment Studio 2008 (SSMS) and Visual Studio 2008, but for those poor developers who are still working with 2005 (me, a couple of weeks ago) where it wasn't a standard feature, there are couple of solutions to accomplish an alphabetical sorted list of packages.

Sorting in Visual Studio 2008

SolutionSorting in Managment Studio 2005
The sorting is done by a store procedure named sp_dts_listpackages. You can alter this procedure to accomplish a sorted list. The bottommost row has been added. Your can find the stored procedure in de MSDB database under System Stored Procedures.

Wednesday, 22 December 2010

Case
A client of mine had some performance issues with couple of SSIS packages and because they lack basic SSIS knowledge, they just upgraded there server with more memory. Finally, after 32GB of memory, they stopped upgrading and start reviewing there packages.

Solution
There are a lot of blogs about SSIS Best Practices (for instance: SSIS junkie). Here is the top 10 of the easy to implement but very effective ones I showed them to 'upgrade' their packages instead of the memory.

1) Unnecessary columns
Select only the columns that you need in the pipeline to reduce buffer size and reduce OnWarning events at execution time. SSIS even helps you by showing the unnecessary ones in the Progress/Execution Results Tab: [DTS.Pipeline] Warning: The output column "Address1" (16161) on output "Output0" (16155) and component "CRM clients" (16139) is not subsequently used in the Data Flow task. Removing this unused output column can increase Data Flow task performance.

Unnecessary columns from a flat file

﻿﻿

2) Use queries instead of tables
Following on the unnecessary columns, always use a SQL statement in an OLE DB Source component or (Fuzzy) Lookup component rather than just selecting a table. Selecting a table is akin to "SELECT *..." which is universally recognised as bad practice.

OLE DB Source, use SQL Command instead of Table

﻿

Lookup, use SQL Command instead of Table

﻿

3) Use caching in your LOOKUP
Make sure that the result of your lookup is unique, otherwise SSIS cannot cache the query and executes it for each record passing the lookup component. SSIS will warn you for this in the Progress/Execution Results Tab: [Lookup Time Dimension [605]] Warning: The component "Lookup Time Dimension" (605) encountered duplicate reference key values when caching reference data. This error occurs in Full Cache mode only. Either remove the duplicate key values, or change the cache mode to PARTIAL or NO_CACHE.

Watch out that you are not grabbing too many resources in the lookup. A couple of million records is probably not a good idea. And new is SSIS 2008 is that you can reuse your lookup cache in an other lookup.

SSIS 2008: Cache

﻿

﻿

4) Filter in source
Where possible filter your data in the Source Adapter rather than filter the data using a Conditional Split transform component. This will make your data flow perform quicker because the unnecessary records don't go through the pipeline.

Filter in OLE DB Source, filter data in source

5) Sort in source
A sort with SQL Server is faster than the sort in SSIS, partly because SSIS does the sort in memory. So it pays to move the sort to a source component (where possible). Note you have to set IsSorted=TRUE on the source adapter output, but setting this value does not perform a sort operation; it only indicates that the data it sorted.﻿ After that change the SortKeyPosition of all output columns that are sorted.

Advanced Editor for Source, sort data in source

﻿

6) Join in source
Where possible, join data in the Source Adapter rather than using the Merge Join component. SQL Server does it faster than SSIS. But watch out that you are not making to complex queries because that will worsen the readability.
﻿

Unnecessary Join and Sorts

﻿

7) Group in source
Where possible, aggregate your data in the Source Adapter rather than using the Aggregate component. SQL Server does it faster than SSIS.

Unnecessary Sorts, Join and Aggregate

8) Beware of Non-blocking, Semi-blocking and Fully-blocking components in general
The dataflow consists of three types of transformations: Non-blocking, Semi-blocking and Fully-blocking. And as the names suggests, use Semi-blocking and Fully-blocking components rarly to optimize your packages. Jorg Klein has written a interesting article about it with a list of which component is non-, semi- or fully blocking.

9) High Volumes of Data and indexes
Loading high volumes of data on a table with clustered and non-clustered indexes could take a lot of time.
The most important thing to verify is if all indexes are really used. SQL Server 2005 and 2008 provide information about index usage with to views: sys.dm_db_index_operational_stats and sys.dm_db_index_usage_stats. Drop all rarely used and unused indexes first. Experience teaches that there are often a lot of unnecessary indexes. If you are absolute sure that all remaining indexes are necessary you can drop all indexes before loading the data and to recreate them afterwards. The performance profit of that depends on the number of records. The higher the number of records the more profit you gain.

Drop and recreate indexes

10) SQL Server Destination Adapter vs OLE DB Destination Adapter
If your target database is a local SQL server database, the SQL Server Destination Adapter will perform much better than the OLE DB Destination Adapter. However the SQL Server Destination Adapter works only on a local machine and via Windows security. You have to be absolute sure that your database stays local in the future otherwise you mapping will not work when moving the database.

Note: this is not a complete list, but just a top 10 of easy to implement but very effective ones. Tell me if you have items that should be in the top 10 of Performance Best Practices!