SSIS Logging in Denali

One final note on logging. The fairly limited information that gets captured in the current CTP is not the same as what will be in the product upon release – there is a lot more to come.

I wasn’t kidding. Those logging enhancements have hit the street in CTP3 and in my opinion they are worth waiting for. SSIS logging has long been an interest of mine primarily because the logging support in SSIS has, up to now, been woeful and that’s what led me to author what has proved to be one of my more popular blog posts Custom Logging Using Event Handlers way back in June 2005. Most serious SSIS shops I come across these days have implemented their own custom logging solution and I’m happy to be able to tell them that with Denali they are no longer forced to do that. In this blog post I’ll walk you through all of the crucial objects in the SSIS Catalog related to logging and by the end of it you’ll hopefully have a good handle on the new capabilities. I assume at this point that you understand what is meant by the terms catalog, projects, parameters, executions, executables, environments, environment references, environment variables and shared connection managers in the context of SSIS; if not then I have plenty of past material that you can go and read to fill yourself in at http://sqlblog.com/blogs/jamie_thomson/archive/tags/denali/ssis/default.aspx.

To demonstrate some of the new abilities I have put together a SSIS project that contains two packages, Master.dtsx & ExportDataForCountry.dtsx:

Note also that we have a Shared Connection Manager.

The intention is to access the [AdventureWorks2008] database and find the 3 best countries with the highest number of sales for a given year. Then, for each of those 3 countries in the given year, extract the sales data and (a) sort and output to a file and (b) aggregate and insert into a table. Here are my project parameters (note the descriptions):

Here is Master.dtsx:

We use an Execute SQL Task to get the 3 top performing countries for the given year (in the “Year” project parameter) and call ExportDataForCountry.dtsx for each one of them. We also ensure that the folder (given in FolderPath project parameter) into which we are going to output the csv files exists.

Note also how I was able to take this screenshot of the whole package. This was made easier thanks to the move to Visual Studio 2010 because each open package is a pane in its own right and can be dragged out of the Visual Studio shell , then onto a second screen if you so wish. A really nice new feature.

ExportDataForCountry.dtsx consists of just one task – a dataflow:

It also has a package parameter (again, note the description):

I am going to use the logging information created by executing Master.dtsx in the rest of the blog post to demonstrate the new logging capabilities in Denali. I have used verbose logging (the highest logging level) and environment variables; here is the resultant code to execute master.dtsx:

You might want to go and get yourself a drink before you continue – this one is a biggie!!

[catalog].[executions]

You are going to become very very familiar indeed with [catalog].[executions]. It is a view that provides a record of all package executions on the server and, most importantly, it contains [execution_id] – the identifier for each execution and the field to which all other objects herein will be related. There are a number of other fields in here some of which are interesting and some which are not; I suspect that you are going to be most interested in:

start_time

end_time

status

[start_time] & [end_time] are self-explanatory. [status] represents the current status of an execution and its possible values are:

Finally, we also get information about the environment that was used by the execution (if at all) such as the environment name and information pertaining to whether it was an absolute or relative reference (don’t worry too much about the different between an absolute or relative reference for now – its not all that important):

There are other columns in [catalog].[executions] but they’re either self-explanatory or not worth talking about here.

[catalog].[executions] is logged to when LOGGING_LEVEL is set to Verbose, Performance, Basic or None.

[catalog].[execution_parameter_values]

This view is self-explanatory – it tells us what parameter values were supplied to the package when it executed however there is probably more information in here than you may first imagine there to be. If you look at the code at the top of this blog post that executed our package you will notice that there are two calls to stored procedure [catalog].[set_execution_parameter_value] (for LOGGING_LEVEL and DUMP_ON_ERROR) thus you may assume that there would be two rows in [catalog].[execution_parameter_values]. Not so, let’s take a look:

We know that FolderPath & Year are our project parameters yet we didn’t define a value for them when we executed. The reason for their appearance in [catalog].[execution_parameter_values] is due to the fact that (as we saw in [catalog].[executions]) this execution was executed against an environment called “2003 localhost” The project was pre-configured to get a value for “Year” from the referenced environment (hence [value_set]=1) whereas “FolderPath” was pre-configured to use the server default and hence why we had no need to supply values for them using [catalog].[set_execution_parameter_value] ([value_set]=0).

DUMP_ON_ERROR & LOGGING_LEVEL are system parameters that we supplied values for using [catalog].[set_execution_parameter_value] (hence [runtime_override]=1)

CALLER_INFO, DUMP_EVENT_CODE, DUMP_ON_EVENT & SYNCHRONIZED are further system parameters that we could have supplied values for.

The six parameters whose name begin with “CM.AdventureWorksDW2008.” are actually properties of the project’s Shared Connection Manager and by default they get logged in [catalog].[execution_parameter_values]. Notice that CM.AdventureWorksDW2008.ServerName has [value_set]=1, that is because our environment reference specified a value for that property too.

[catalog].[execution_parameter_values] is logged to when LOGGING_LEVEL is set to Verbose, Performance, Basic or None.

[catalog].[executables]

[catalog].[executables] contains a record for every single executable that gets executed in an execution. What’s an executable? It is every task, container or package in both the package that you execute and any child packages that also get executed using the Execute Package Task. Here’s what we get:

We can see that all our tasks and our For Each Loop (“FEL Loop over top performing regions”) appear in here along with the containing package. Interestingly we also get package_path which provides the name of the task in the context of its location in the package.

It is worth noting that we only get a record in [catalog].[executables] if an executable actually gets executed. If that doesn’t happen (because of an earlier error or conditional control-flow) then there won’t be a record in here.

Also note that executables “ExportDataForCountry” and “DFT Export Sales for named country” have an [executable_id] that is unique across the whole execution, not just across [package_name]. In other words SSIS doesn’t care which physical .dtsx file an executable is in, it treats them all as executables in a single execution; I happen to think that is a massive step forward in Denali and you’ll see even more benefits of that in later views.

[catalog].[executables] is logged to when LOGGING_LEVEL is set to Verbose, Performance, Basic or None.

[catalog].[executable_statistics]

This is where the new logging infrastructure in Denali really starts to show benefit. For every executable in [catalog].[executables] we get information about it being executed such as whether it succeeded or not and how long it took. You might ask “Why not just put that information in [catalog].[executables]?” and actually the answer is simple when you think about it; an executable can, if it exists inside a For Loop or For Each Loop container, be executed more than once during a single execution. SSIS will capture information about each instance of an execution being executed and moreover will differentiate between each one of them. let’s take a look:

We don’t just get the name of the executable, we get its position in the container hierarchy (aka call stack). Best of all that call stack extends across different packages (remember that “Master” and “ExportDataForCountry” are the names of our two packages). We can also differentiate between different instances of an executable via the [1], [2], [3], ... suffix on our For Loop and For Each Loop iterations.

Think back to the [msdb]..[sysssislog] table prior to Denali which is just a long flat list of intermingled records and how difficult it is to find the information in there that you are after and I think you will realise the benefit of having [execution_path]. This contextual information is something I have been asking for for a long time and I am delighted that it is now available. In fact, I’ll go on record as saying that I think [execution_path] is the best new feature in SSIS in Denali, I really do. Note that [execution_path] is different to [catalog].[executables].[package_path] that we saw earlier because [package_path] only tells us the location of an executable in a package; it does not provide information in the context of an execution therefore it doesn’t extend across packages either like [execution_path] does.

We get an execution duration for every single executable (we also get start time and end time which I haven’t shown on this screenshot). Lastly we also get [execution_result] which tells us whether an executable succeeded or not – the value in [execution_result] comes from the DTSExecResult enumeration.

There is a lot of information wrapped up in just these three columns and I make heavy use of it in SSIS Reporting Pack, a suite of reports that I am putting together which provide value on top of the logged information. Here is a screenshot of a new report that leans heavily on [catalog].[executable_statistics]:

More to come on SSIS Reporting Pack in a later blog post.

[catalog].[executable_statistics] is logged to when LOGGING_LEVEL is set to Verbose, Performance, Basic or None.

[catalog].[event_messages]

[catalog].[event_messages] is the real meat of the new logging infrastructure and is where I suspect folks investigating SSIS issues will spend most of their time. It performs a very similar role to [msdb]..[sysssislog] however the information herein is much more enhanced as we shall see. It is also the table for which your choice of LOGGING_LEVEL can have the biggest affect.

The first point to make about [catalog].[event_messages] is that it does not contain [execution_id], instead it has [operation_id] which actually is an [execution_id]. The reason for the different name is that various different types of operations can occur on the SSIS Server, only one of which is package execution. Each operation will log into [catalog].[event_messages] and hence we have [operation_id] instead of [execution_id]. In this blog post the only operation type that I am covering is package execution.

The data in [msdb]..[sysssislog] was based around the notion of events, all tasks were capable of raising those events and we had the choice of capturing those events and logging them; from that perspective things are no different in Denali. We get all of the columns that we used to get in [msdb]..[sysssislog], the important ones being:

Another new field is [subcomponent_name] which, as far as I can discern, contains the name of a dataflow component or the internal pipeline that is throwing the event. This is really useful information, previously we only got the name of the dataflow.

Notice some of the information that we get here regarding rows cached and bytes of memory used, this is all really useful information to have access to.

Two other new columns are [message_type] and [message_source_type] which allow you to further drill down on the information that you are after. They appear in the view as numeric values and whilst there is no documentation publicly available yet on what these values mean I can fill you in on them right here:

First [message_type]

message_type

Description

10

OnPreValidate

20

OnPostValidate

30

OnPreExecute

40

OnPostExecute

60

OnProgress

70

OnInformation

90

Diagnostic

110

OnWarning

140

DiagnosticEx

200

Custom events

400

OnPipeline*

and [message_source_type]

message_source_type

Description

10

Messages logged by the entry APIs (e.g. T-SQL, CLR Stored procedures)

20

Messages logged by the external process used to run package (ISServerExec)

30

Messages logged by the package-level objects

40

Messages logged by tasks in the control flow

50

Messages logged by containers (For, ForEach, Sequence) in the control flow

60

Messages logged by the Data Flow Task

As I alluded to above your choice of LOGGING_LEVEL greatly influences the amount of information that gets logged. I ran this package against each of the four logging levels and here are the number of rows in [catalog].[event_messages]:

LOGGING_LEVEL

Rows

Time to Execute (seconds)

None

0

4

Basic

128

3

Performance

1

10

Verbose

2726

12

Given that this execution only contains 5 task executions and one For Each Loop, 2726 records for LOGGING_LEVEL=Verbose strikes me as being extravagant and indeed this is reflected in the much higher execution time. These numbers are not representative of your environment and of course you should do your own testing accordingly but it does seem as though LOGGING_LEVEL=Verbose can have a detrimental affect on execution duration. If you take a look at the messages that get logged from Verbose logging it is clear that this is intended to be used for diagnostic purposes; messages such as the following aren’t the sort that you want to be wading through on a regular basis:

Interestingly [message_source_name] for all these messages is ‘AdventureWorksDW2008’ which is the name of our Connection Manager so it seems as though LOGGING_LEVEL=Verbose may be a shorthand way of telling SSIS to log all information pertaining to resources external to the package.

I have noticed some interesting new messages when LOGGING_LEVEL=Verbose too that I don’t remember ever seeing before:

Lots of useful information in there, I particularly like that we get a message for each executable telling us its execution duration. Yes, this information is in [catalog].[executable_statistics] as we have already discussed but its useful to have that information in our main log table too.

As you can see [catalog].[event_messages] is the workhorse of the new logging infrastructure so its worth spending time getting to know it. Similar to [catalog].[executable_statistics] I make heavy use of this table in SSIS Reporting Pack and the existence of [execution_path] makes it easy to browse to the messages that you want to see:

We can highlight errors and warnings much more clearly too:

[catalog].[executable_statistics] is logged to when LOGGING_LEVEL is set to Verbose, Performance or Basic. For LOGGING_LEVEL=Performance only OnError and OnWarning events get logged. For LOGGING_LEVEL=Basic OnError, OnInformation, OnPreExecute, OnPostExecute, OnPreValidate, OnPostValidate and OnWarning events are logged however as I demonstrated above less OnInformation events are logged than for LOGGING_LEVEL=Verbose.

[catalog].[event_message_context]

Have you ever wanted to know what the properties of an executing package were after the event? That is exactly what [catalog].[event_message_context] is provided for. For a very small subset of the messages in [catalog].[event_messages] it logs properties of the package and its connection managers at the point-in-time that the message is logged. The properties are logged as name:value pairs. This is all better illustrated with a screenshot:

[catalog].[event_message_context] is logged to at the start of a package execution at which point it tells you all the properties of the package. This is useful (I guess) but more useful than that is the logging of properties when an OnError event occurs; more accurately, [catalog].[event_message_context] will record all the properties of a failed task. To demonstrate I shutdown the server hosting the [AdventureWorks2008DW] database prior to executing our package and here is a subset of what gets logged in [catalog].[event_message_context]:

AND [property_name] IN ('Connection','StartTime','StopTime','SqlStatementSource')

As you can see we get the actual SQL statement that was being executed which is invaluable information if the SQL statement is being built dynamically using an expression or a Script Task. Prior to Denali we simply did not have access to this sort of information and is a great example of how logging has been massively enhanced. Moreover we also get StartTime and StopTime so it is easy to see, at a glance, how long it took the task to fail. I find it interesting to note that StartTime and StopTime are execution-time-only properties, but they still get logged. I wonder what other execution-time-only properties exist? If you find any do let me know!

One final thing to note is that when an error occurs [catalog].[event_message_context] will contain information about every object in the call stack so if, for example, an Execute SQL Task that is inside a For Each loop fails you will get information about the For Each loop too. Very cool!

[catalog].[event_message_context] is logged to when LOGGING_LEVEL is set to Verbose, Performance and Basic.

[catalog].[execution_component_phases]

If you have ever wanted to know which parts of a dataflow are running slowly then [catalog].[execution_component_phases] is for you. It records how long each data flow component spends in each phase of execution, those phases are:

Validate

PrepareForExecute

PreExecute

ProcessInput

PrimeOutput

PostExecute

Cleanup

The most obvious use of this view is simply to discover where the bottlenecks are in your dataflows by discovering which components take the longest time to complete. Before we look at the information in [catalog].[execution_component_phases] let’s remind ourselves of the dataflow that we’re going to examine here:

There are nine components, let’s look at how long each of them is “alive” for during the execution of this dataflow:

Here we see that LKP Get Customer details has spent the most time executing. To be honest though this query does not give us the best indication of where the bottlenecks are because it includes all phases; the Validate phase will begin at roughly the same time for all components whereas the Cleanup phase for each will end at roughly the same time as well. A better indicator would be to look at the ProcessInput and PrimeOutput phases because this is generally where most of the work is done:

Here we get a much better indicator of where time is spent and where the bottlenecks are in our dataflow. Unsurprisingly the destination component that inserts data into the database takes the longest time to complete its work however look at the next three, the Sort component, the Aggregate component and the OLE DB Source component; these three components are all asynchronous so the information that we are seeing here supports the common held belief that asynchronous components are generally slower.

Also notice that Multicast is the shortest to execute. Again not surprising given that Multicast components don’t actually do any work.

We have two derived column components (“DER Concat Full Name” & “DER Calculate TotalPrice”) yet one of them takes a lot less time than the other. Again this makes sense if we look at our dataflow; “DER Concat Full Name” occurs after the Aggregate component thus it will almost certainly be operating on less rows. Less rows means less work to do means less time to execute.

As an aside, [catalog].[execution_component_phases] makes it easy to determine exactly which components are asynchronous because they are the ones that have a PrimeOutput phase:

OK, so we have seen that the asynchronous components and the destination component that inserts into a database generally take the longest in the “work” phases however there are other insights to be had here. If we take a look at the PreExecute phase:

we notice that our lookup component (“LKP Get customer details”) appears much higher in the list than it did previously. This is not surprising given that this is a fully cached lookup component and the PreExecute phase is when that cache gets populated.

Clearly there are many many insights to be gleaned from [catalog].[execution_component_phases] and it should prove to be an invaluable tool when performance tuning dataflows.

[catalog].[execution_component_phases] is logged to when LOGGING_LEVEL is set to Verbose and Performance.

[catalog].[execution_data_statistics]

Where [catalog].[execution_component_phases] tells us how long each component executed for, [catalog].[execution_data_statistics] tells us how much data they processed. I’m sure that if you have ever executed packages in BIDS and made use of the real-time rowcounts that appear in the GUI:

then at some point you will have wished that you had the same outside of BIDS too; well, that is exactly what [catalog].[execution_data_statistics] provides. Every time a dataflow component passes a buffer on to the next component then you will get a record in [catalog].[execution_data_statistics]. Specifically you will get the following information:

execution_path of the containing dataflow (we’ve covered this quite a bit already)

Of course once you have the raw data then you can party on it as much as you want. Remember that [catalog].[execution_data_statistics] has a record for each buffer, not for each datapath and hence to get a total per datapath you’re going to need to SUM:

[catalog].[execution_data_statistics], along with [catalog].[execution_component_phases], seems set to be an integral part of your dataflow performance tuning armoury.

[catalog].[execution_data_statistics] is logged to when LOGGING_LEVEL is set to Verbose.

That concludes this run through of the new logging capabilities in SQL Server Denali CTP3. I personally am very happy indeed with these enhancements and am looking forward to building new solutions on this infrastructure as soon as possible. How about you? Are these new capabilities what you have been waiting for or are they superfluous to your needs? Let me know in the comments.

Your question of "*can* I do do this" is easy to answer - yes you can. The question of "*should* I do this" is a little harder. I mean, SSISDB is a user database so you can treat it like any other user database - I don't have any good guidance on whether its a good idea or not though I'm afraid.

Btw, which SSISDB catalog table would hold the specific "type" of the task being logged ? For e.g. if I were to know the execution stats of only DFTs in packages and leave out other tasks. How can I filter the resultset for specific component types (e.g. DFT, ESQL etc) ?