What is a Data Collection and the MDW?

Just in case you didnt read either one of my first two
articles let me digress a little and describe what a Data Collection and the
MDW are.

A Data Collection is a process that allows a DBA to easily define some data they want to collect, how often it should be collected, the method that should be used to collect it, and how long the collected data should be retained. A Data Collection utilizes SQL Server Integrated Services and SQL Agent to perform the actual data collecting.

The data that is collected via Data Collections is stored in
the MDW. The MDW is just a database. When you set up an instance to collect
data via a Data Collection you need to identify the MDW that will be use for
storing the collected data. A given MDW database can be used to store data
for a single instance or multiple instances.

Defining a Custom Data Collection

In order to create a custom Data Collection you need to use
a T-SQL script. For some reason Microsoft didnt build a GUI interface to
create a custom Data Collection. Hopefully in future versions a GUI interface
will be developed, but for now the only way to create your custom Data Collection
will be to execute a T-SQL script.

You can create a T-SQL script by hand to define your custom
Data Collection. Alternatively, you can generate a script from one of the
three different system Data Collections, and then modify it to meet your needs.
This is the method I prefer. It is easier to generate a script of a system
Data Collation by using the script task from Object Explorer and modifying it,
then keying in a complete script by hand.

In order to demonstrate how to build a custom Data Collection
I need to determine what data I want to collect. For this demonstration, I
will be collecting index usage statistics that can be gathered using the sys.dm_db_index_usage_stats
DMV. By capturing the index usage statistics from the DMV and storing them
in the MDW, I will be able to track which indexes are used overtime. Here is
the T-SQL script I have that will list the index usage statistics for a
database. This script will be incorporated into a custom Data Collection that
will capture index usage statistics for all user databases once a day.

To build my custom Data Collection script I will first
generate a script from an existing system Data Collection and then modify it to
use the above script to collect my statistics. Microsoft has provided three
different Data Collection types that can be used to collect statistics and they
are: T-SQL Query, SQL Trace and Performance Counters. Since I will be
collecting data using a T-SQL script, the T-SQL Query collector type is the
appropriate type to use for collecting my data. Therefore, I need to generate
a script from a system Data Collection that uses this type. The Disk Usage
system Data Collection happens to use a T-SQL collection type. Once I
generate the Disk Usage Data Collection script, I will need to modify it to
meet my needs. Here is my modified script:

I have broken this script into sections to make it easier to go through each
section and describe how this script creates a custom Data Collection.

Section A is used to create a collection set. I create a
collection set by executing the sp_syscollector_create_collection_set stored
procedure. This stored procedure requires a number of different parameters, a
few of which are worth noting: @collection_mode, @days_until_expiration, and @schedule_name.
The @collection_mode Ive set to 1. This tells SQL Server that it should use a
non-cached Data Collection. There are two different caching modes, cached and
non-cached collections. A cached collection is one that saves the data in a
temporary directory and then periodically uploads the cached information into
the MDW, whereas non-cached collections tells SQL Server to collect and upload
the collected data into the MDW in one operation. The @days_until_expiration
parameter is used to identify how long you want to keep the collected
information in the MDW. In my case, I specified 180 days. SQL Server will
automatically purge my collection index statistics after they have been in the
MDW longer than 180 days. Here I have specified Daily at Midnight for the @schedule_name. Daily at Midnight is a shared SQL agent schedule that I
created. This tells the Data Collection process to create a SQL Agent job
that is to be based on the shared schedule with this name. This causes my new
collection to be run once a day at midnight to collect my missing index
statistics.

Section B is just a short section. In this section of my script,
I identify the type of Data Collection I want to use to collect my index
statistics. Since Im using T-SQL code to collect statistics I need to use the
Generic T-SQL Query Collector Type. You can use this collector type any time
you plan to use a T-SQL script to collect your statistics.

Section C pulls together some of the output columns from
Sections A and B along with my T-SQL script to identify what statistics will be
collected. If you look closely at the @parameters parameter value, you will
note it contains some XML. This XML contains my T-SQL script as well as some
other important stuff. Each element of the XML tells SQL Server something
different about this Data Collection. The <Query> element contains a
<Value> and <OutputTable> element. The <Value> element
contains my T-SQL statement that collects index usage statistics. If you look
close at the T-SQL you will see that I had to replace the greater than symbol
(>) in my T-SQL code with &gt;. This is because the greater than
(>) and the less than symbols (<) are used to begin and end the name
of XML elements. If you didnt replace these symbols then the string
associated with this parameter would not contain valid XML. I also have to
replace the single quote with two single quotes. The <OutputTable>
identifies the table within the MDW that will be created, and used to store
statistics being gathered. Here I identified IndexUsage for the name of the
table that I want my statistics to be stored in. The last element is the
<Databases> element. This element has the attribute UseUserDatabases
set to true. This tells SQL Server to only collect statistics for user
databases, and not system databases.

Note that there are a number of different ways to specify
this last element <Databases>. If you want to include both user
databases and system databases, you would specify this element like so:

<Databases UseSystemDatabases="true" UseUserDatabases="true" />

If you only wanted to gather statistics for the system databases,
you could just specify only the UseSystemDatabases parameter. If you should
want to collect statistics for only a couple of databases, you would specify
something similar to the following:

If I used the above for my <Databases> element then I
would be telling the data collection process to only collect statistics for the
databases MyDB1 and MyDB2. As you can see, there are a number of
different ways to identify the databases from which you want to collect
statistics.

Once I have run the data collection script above, my custom
Data Collection will be created, but this doesnt mean SQL Server will be start
collecting my statistics. I must first start the collection in order for it to
start collecting information. To start it I can run the following command:

Alternatively, I can start my Data Collection in SSMS by
right clicking on the Data Collection item in the Object Explorer pane and
selecting the Start Data Collection Set option.

Viewing Information in the MDW

After my Data Collection has run the first time, I can view
the information it collected in the MDW. To view the data I can run the
following simple T-SQL SELECT statement:

SELECT * FROM MDW.custom_snapshots.IndexUsage

This SELECT statement will display the statistics that my
custom Data Collection has gathered and place in my IndexUsage table within custom_snapshots
schema in the MDW database. If this table doesnt exist when my custom Data
Collection is run the first time, it will be created.

All custom Data Collections create their MDW tables in the custom_snapshots
schema. It may take a few moments the first time you run your custom Data
Collection for this object to be created in the MDW.

Collecting Information

Creating custom Data Collections in SQL Server 2008 allows
you to gather and manage information about your SQL Server Instances. The
amount of effort it takes to set up a Data Collection is minimal. Microsoft
has built in all the SSIS components to gather, populate and purge information
from the MDW for you. By using custom Data Collections, DBAs can now easily
create a method to gather the kinds of statistics they need to better manage
their SQL Server environment.