Functional Dependency

Determines the extent to which the values in one column (the dependent column) depend on the values in another column or set of columns (the determinant column) ; This profile also helps you identify problems in your data, such as values that are not valid. For example, you profile the dependency between a column that contains Country or Region Codes and a column that contains States/Provinces. The same Country or Region Codes should always have the same States/Provinces, but the profile discovers violations of this dependency.

Now let’s take a crack at writing our own. Naturally when facing a difficult problem in TSQL I turned to SQL Server Central vast library.

Armed with these two articles I now had all the techniques I need to develop a query or stored procedure to provide Column Functional Dependency.

TSQL Functional Dependency

The heart of this effort is to utilize Common Table Expressions(CTE) or the “with” clause as defined on MSDN as:

“Specifies a temporary named result set, known as a common table expression (CTE). This is derived from a simple query and defined within the execution scope of a single SELECT, INSERT, UPDATE, or DELETE statement. This clause can also be used in a CREATE VIEW statement as part of its defining SELECT statement. A common table expression can include references to itself. This is referred to as a recursive common table expression.”

Basically the “with” clause provides you with the ability to logically create CTE’s or result sets ,logically develop your code and employ recursive techniques.

The code here is basically leveraging the recursive capabilities of CTE. Our objective here is to compare current rows to previous row columns to determine when a Dependent Column value changes and the Determinant columns has NOT changes. This is a Violation. A detailed explanation of the technique of linking to previous records and be reviewed at SQL Server Central returned Linking to the Previous Row By David McKinney, 2008/03/13. One issue I had was needing to dynamically populate the From clause or Table.

Now that we have created a table with the Functional Dependency Violations identified we want to create a table that can be used for Analysis or Reporting, containing the same percentages available in the SSIS 2008n Data Profiling Task Functional Option. Again we have relied on the CTE in order to create a reusable query that can create a single table that has the required for analysis such as Totlal Determinint , Dependents categroized by Supporting Dependents and Violating Dependents.

Example Functional Summary Results

Here are the results from the SSIS 2008 Data Profiling Task for Functional Dependency.

As you can see the Support Percentages for Alabama are exactly the same.

Roll your Own Interactive Scorecard

I would also add from an analytical perspective you can quickly create a Pivot Table in Microsoft Excel 2007. Simply open Excel 2007, select the Data Tab .

Then select From Other Sources then follow the wizard and create a connection to the DependencyCompareOLAP table to create the Pivot table and Chart below. As you can see here is a quick way to determine the outliers like Alabama.

Or Manitoba in Canada

Further Enhancements

You also can add additional columns to the table and stored procedures, such as table name, date and time and create a repository that would allow you to track the Violations incrementally or over time, and provide proactive processes to respond to continuing anomalies.