Mining Structures (Analysis Services - Data Mining)

The mining structure defines the data from which mining models are built: it specifies the source data view, the number and type of columns, and an optional partition into training and testing sets. A single mining structure can support multiple mining models that share the same domain. The following diagram illustrates the relationship of the data mining structure to the data source, and to its constituent data mining models.

The mining structure in the diagram is based on a data source that contains multiple tables or views, joined on the CustomerID field. One table contains information about customers, such as the geographical region, age, income and gender, while the related nested table contains multiple rows of additional information about each customer, such as products the customer has purchased. The diagram shows that multiple models can be built on one mining structure, and that the models can use different columns from the structure.

Model 3 Uses CustomerID, Age, Gender, and the nested table, with no filter.

Because the models use different columns for input, and because two of the models additionally restrict the data that is used in the model by applying a filter, the models might have very different results even though they are based on the same data. Note that the CustomerID column is required in all models because it is the only available column that can be used as the case key.

Data Sources for Mining Structures

When you define a mining structure, you use columns that are available in an existing data source view. A data source view lets you combine multiple data sources and use them as a single source in the created structure or mining model. The original data sources are not visible to client applications.

If you build multiple mining models from the same mining structure, the models can use different columns from the structure, and use the columns in different ways. For example, you can create a single structure and then build separate decision tree and clustering models from it, with each model using different columns and predicting different attributes.

The data mining structure stores only the bindings to the source data. You can also create a data mining structure without binding it to a specific data source by using the DMX CREATE MINING STRUCTURE (DMX) statement.

Mining Structure Columns

The building blocks of the mining structure are the mining structure columns, which describe the data that the data source contains. These columns contain information such as data type, content type, and how the data is distributed. The mining structure does not contain information about how columns are used for a specific mining model, or about the type of algorithm that is used to build a model; this information is defined in the mining model itself.

A mining structure can also contain nested tables. A nested table represents a one-to-many relationship between the entity of a case and its related attributes. For example, if the information that describes the customer resides in one table, and the customer's purchases reside in another table, you can use nested tables to combine the information into a single case. The customer identifier is the entity, and the purchases are the related attributes. For more information about when to use nested tables, see Nested Tables (Analysis Services - Data Mining).

To create a data mining model in Business Intelligence Development Studio, you must first create a data mining structure. The Data Mining wizard walks you through the process of creating a mining structure, choosing data, and adding a mining model.

If you create a mining model by using Data Mining Extensions (DMX), you can specify the model and the columns in it, and DMX will automatically create the required mining structure. For more information, see CREATE MINING MODEL (DMX).

Training and Testing Data

When you define the data for the mining structure, you can also specify that some of the data be used for training, and some for testing. Therefore, it is no longer necessary to partition your data in advance of creating a data mining structure. You can specify that a certain percentage of the data be held out for testing, and the rest used for training, or you can specify a certain number of cases to use as the test data set. The partition information is cached with the mining structure; therefore, the same test set can be used with all models that are based on that structure.

Enabling Drillthrough

You can add columns to the mining structure even if you do not plan to use the column in a specific mining model. If you do not specify a usage for the column, the column is ignored for analysis and prediction. However, it can still be used in queries by enabling drillthrough on the mining model. For example, if you have the appropriate permissions, you can drill through from a particular result in a mining model to retrieve detailed information about the cases in the node, and even access structure columns that were not used in the model.

Processing Mining Structures

A mining structure is just a metadata container until it is processed. When you process a mining structure, Analysis Services creates a cache that stores statistics about the data, information about how any continuous attributes are discretized, and other information that is later used by mining models. The mining model itself does not store any data, but references the information in the cache. Therefore, when you process a mining model, the structure cache must be available. If it is not available, the structure must be reprocessed before the model can be built.

If you do not want the data to be cached, you can change the CacheMode property of the mining structure to ClearAfterProcessing. This will destroy the cache after any models are processed. Setting the CacheMode property to ClearAfterProcessing will disable drillthrough from the mining model.

As long as the cached data is available, the mining structure does not need to be reprocessed when you add a new mining model to the structure; you can process the model only. For more information, see Processing Data Mining Objects.

Viewing Mining Structures

You cannot use viewers to browse the data in a mining structure. However, in Business Intelligence Development Studio, you can use the Mining Structure tab of Data Mining Designer to view the structure columns and their definitions. For more information, see Data Mining Designer.

If you want to review the data in the mining structure, you can create queries by using Data Mining Extensions (DMX). For example, the statement SELECT * FROM <structure>.CASES returns all the data in the mining structure. To retrieve this information, the mining structure must have been processed, and the results of processing must be cached.

A data mining model applies a mining model algorithm to the data that is represented by a mining structure. A mining model is an object that belongs to a particular mining structure, and the model inherits all the values of the properties that are defined by the mining structure. The model can use all the columns that the mining structure contains or a subset of the columns. You can add multiple copies of a structure column to a structure. You can also add multiple copies of a structure column to a model, and then assign different names, or aliases, to each structure column in the model. For more information about aliasing structure columns, see How to: Create an Alias for a Model Column and Setting Properties on a Mining Model.