Processing Active Directory Information in SSIS

Processing Active Directory Information in SSIS

It is possible to retrieve data from Active Directory and process this data in SSIS 2005 and 2008. In this article, I will look at methods for retrieving and processing this data, and give
step by step procedures for accessing this data in SSIS. At the end, I will give an example of importing data from an external source into Active Directory using SSIS.

Retrieving

There are 3 ways we will look at to retrieve data from Active Directory for processing in SSIS. These are by using Lightweight Directory Access Protocol (LDAP), by using Windows Management Instrumentation (WMI), and by querying the information
from SQL.

Throughout this paper, I will be accessing data in a domain named “steve.homeoffice.”

Wikipedia defines LDAP as an application protocol to allow for querying and modifying data in Internet Protocol (IP) networks. LDAP queries can be submitted using the Active Directory Service Interface (ADSI). In Windows, we can use the
OLE DB provider for Directory Services to issue SQL queries against LDAP Active Directory via ADSI to retrieve Active Directory information into SSIS for processing. The connections vary somewhat depending on whether we are accessing the data in the Control
Flow, or in the Data Flow.

Control Flow

In the Control Flow, you can use an Execute SQL task to retrieve data from Directory Services. The steps required for this are:

1.Set up an OLE DB connection manager using the Native OLE DB/OLE DB Provider for Microsoft Directory Services pointing to the domain controller against which you will issue your queries.

NOTE: When you click the “Test” button to test your connection, it may indicate that the connection was successful even though Directory Services is not running on the server you are pointed to. However;
in order for your package execution to succeed, the connection manager must be pointing to a computer where directory services is running.

NOTE: Although the option exists to set the user name and password rather than using “NT Integrated Security”, connections will fail for Active directory queries if “NT Integrated Security” is not used.

NOTE: Some references instruct you to set the ADSI flag to 1 to make this work. However; I have been able to make this work with the default setting of -2147483648. If the default does not return results,
try setting the ADSI flag to 1 in the “All” settings page.

Figure 1: An OLE DB Connection Manager for Microsoft Directory Services set up to point to a domain controller named home-server

2.Create a variable of type “Object” scoped at the package level.

3.Set up an Execute SQL control flow task using this connection manager. Set the Result Set type to “Full” and set the result set to be stored in the Object variable you created in step 2.

4.For the SQL Query in the Execute SQL task, use your LDAP SQL query. An example to retrieve the names of all user accounts from Active Directory is:

SELECT cn FROM 'LDAP://my-server' WHERE objectClass='User'

Note that the single quotes are required around the object you are selecting “FROM.” In addition, you can query from a server name as is done in this example, or you can specify the DCs are are specified in directory services.
Running a “SELECT *” instead of the “SELECT cn” will return results in a single column (ADsPath) from which you can determine what DC values are valid in your domain. An example of the query using DC is:

SELECT cn FROM 'LDAP://home-server/DC=steve,DC=homeoffice' WHERE objectClass='User'

NOTE: Directory services can also use the ADSI LDAP query syntax. For example, in place of the select query above, a query to get just user names using the ADSI LDAP syntax would be:

Figure 2 An Execute SQL task using the connection manager pointing to Directory Services. The SQL Query is entered directly in this example, and the ResultSet is set to “Full.” An ADSI LDAP query can be used
in place of the SQL query.

Figure 3 When the result set is set to "Full" on the general page, you must go to the "Result Set" page and map the result sets into a variable. In this case, we have mapped the results
into a variable named "Results." The variable into which you place the result set(s) must be of type “Object.”

5.Once the results have been retrieved into a variable in the control flow, you can use the results the same way you would results from any other query type. As an example, you can use this with
a Foreach loop with an ADO Enumerator to process the results a row at a time. For more information on using the Foreach Loop with an ADO Enumerator, see
http://msdn.microsoft.com/en-us/library/ms140016.aspx

Data Flow

Data can be retrieved from Active Directory using Directory Services into the data flow, but if you try to retrieve the data using an OLE DB connection manager, SSIS will not be able to execute the query and retrieve the metadata needed
to design the data flow. To retrieve the data into the data flow, you must use an ADO.Net connection manager configured to use the OLE DB Provider for Directory Services. The following is an example data flow to illustrate retrieving the data from Active Directory
into the data flow:

1.Create an ADO.Net connection manager. Under the “Provider” drop down, expand the “.NET Providers for OleDB” folder and select the “OLE DB Provider for Microsoft Directory Services” provider. For “Server or file name” enter the name of your domain
controller, and ensure the “Use Windows NT Security” radio button is selected.

Figure 4 The ADO.NET connection manager configured to point to Directory Services on the home-server domain controller.

2.Drag a “Datareader Source” (SQL 2005) or an ADO .NET Source (SQL 2008) onto the data flow design surface. Configure the source to use the ADO.NET connection manager you just configured pointing to directory services.

Figure 5 The Component Properties tab of the Advanced Editor for a SQL 2005 SSIS Datareader Source. The SQL query is the Directory Services query to be executed against
the LDAP server. In this illustration, it is a SQL query, but an ADSI LDAP query can be used in place of the SQL query.

Figure 6 Properly configured ADO.NET Source for SQL 2008 SSIS for a SQL query against Directory Services. An ADSI LDAP query can be used in place of the SQL query.

4.Once the source is set up, you can pull the Active Directory data into SSIS to process through the data flow like from any other source.

NOTE: The data output from the query may be of type DT_NTEXT which is a streaming data type. This may not be compatible with all destinations. You may need to add a “Data Conversion” transform to your data
flow to convert this data type into DT_WSTR or other appropriate data type to be able to output to destinations such as text files.

Windows Management Instrumentation (WMI)

WMI is a set of extensions to the Windows Driver Model that provides an operating system interface through which instrumented components provide information and notification. WMI is Microsoft's
implementation of the Web-Based Enterprise Management (WBEM) and Common Information Model (CIM) standards from the Distributed Management Task Force (DMTF). (Ref:
http://en.wikipedia.org/wiki/Windows_Management_Instrumentation). WMI can be used to access Directory Services information.

WMI access is only available in the control flow if you are not using either a script component or custom component. To read data from Active Directory using WMI, you must use the WMI Reader task in the control flow. This task will store
data returned by its queries in either a comma separated text file or a variable. Configure where the output will be stored in the “Destination Type” property.

The Output Type must be either a Data Table; Property name and Value; or Property value. If all columns are single value columns, then a Data Table may work well output to a delimited file. However; if some properties have multiple values,
then outputting to a Property name and Value output may be better. It the output is a data table, and it is output to a file, then the output can be processed through a subsequent data flow task. However; if the output must be stored on multiple lines such
as a Property name and value, then processing may need to be done through a script component or a custom component.

Figure 7 A WMI Data Reader task configured with a WQL query asking for all user accounts in the "Steve" domain. The output will go to a Data table and be stored in the WMIOutput.txt
text file.

SQL

Active directory can be queried inside of SQL either in an ad hoc fashion, or through an established linked server. No modifications can be made to the data when queries are entered through SQL, but the data can be accessed by making use
of the Active Directory Server Interface (ADSI).

OpenRowSet

For ad hoc access to active directory information, you can use the OpenRowSet function. In order for an ad hoc distributed query to be executed, it must be enabled in sp_configure. To enable
ad hoc distributed querying, execute the following on the SQL Server from which the OpenRowSet query will be executed:

EXEC sp_configure 'Ad Hoc Distributed Queries', 1

GO

RECONFIGURE WITH OVERRIDE

GO

An example of the OpenRowSet using a SQL Query against directory services on a server named home-server, and not using a trusted connection is here:

SELECT * FROM OPENROWSET ('ADsDSOObject',

'home-server;steve\MyUserID;MyPassword',

'SELECTCN

FROM''LDAP://home-server/DC=steve,DC=homeoffice''

WHEREobjectClass = ''USER''')

The obvious issue with this is that a password must be stored somewhere which may be unacceptable. If you are using an account that has appropriate privileges, you can use a trusted connection like this:

To retrieve results from these types of queries into SSIS, set up a normal connection to SQL Server, and use the query as the SQL Command in your data source. So far as SSIS is concerned, this is just SQL data at this point and it is handled
accordingly.

Linked Server

You can also establish a linked server with an ADSI server to retrieve the results much as you would ad hoc. However; with a linked server, you set the security and store it in your SQL Server, and only need to know the name of the linked
server at query time.

You can also set up the linked server and login using the GUI in Management Studio. The provider you will use is ADSDOObject. Once the linked server is set up, you can query active directly using the OPENQUERY function like this:

select * from openquery(ADSI, 'SELECTCN

FROM''LDAP://home-server/DC=steve,DC=homeoffice''

WHEREobjectClass = ''USER''')

Where ADSI is the name of the linked server you configured above. As with other cases when using ADSI, you can use LDAP syntax for these queries:

Import Records into Active Directory

To import records into active directory from an external source using SSIS, you will need to use a script component in your data flow. To set this up to input new active directory records,
first set up your SSIS connections and source in your data flow. For the actual import, drag a Script Component onto your data flow design surface. When prompted to use the script component as a source, transform, or destination, choose to create a data destination.

Figure 8 A script component set up as a data destination with all columns selected as available inputs.

NOTE: It might be advisable to set up an output from your destination where you can send rows that cannot be imported. That step will not be covered in these instructions.

1.Choose to make the appropriate rows available in your script component destination.

2.Design your script.

3.Add a reference to System.DirectoryServices.

4.For Visual Basic, add an “Imports System.DirectoryServices” line in your imports section. For C#, add a “Using System.DirectoryServices to the Using section of your script.

5.Use the following snips as an
example for VB and for C# for creating ActiveDirectory accounts and configuring them. Modify the examples for your needs as this may not work exactly with your data and component configurations.