Exploring the Forecasting Model (Intermediate Data Mining Tutorial)

Now that you have built the forecasting mining model, you can explore the results by using the Mining Model Viewer tab of Data Mining Designer. The Microsoft Time Series Viewer contains two tabs: Charts and Model.

The forecasting mining model that you built describes sales of products in three different regions—Europe, North America, and the Pacific—for the years 2005-2010. Therefore, the Microsoft Time Series algorithm creates a time series model that contains multiple trees, each tree containing a different combination of region, product, and predictable attribute.

Each of the tabs in the viewer displays a different view of the information in the complete time series model.

The Charts tab of the Microsoft Time Series Viewer graphically shows you each of the trees that the algorithm creates. A time series tree contains a unique combination of product, region, and predictable attribute.

The legend on the right side of the viewer lists the time series that are selected in the drop-down list, and includes a check box for each time series. You can select and clear the check boxes in the legend to control which time series displays in the viewer.

You can also change the display options, such as the colors used for each time series, or whether values are displayed at points in the chart.

To select a time series

Click the Charts tab of the Mining Model Viewer tab, if it is not visible.

Click the drop-down list to the right of the chart view, and select all the check boxes. The chart should contain 24 different series lines.

Click OK.

In the check boxes to the right of the chart, clear the boxes to temporarily hide the lines for all series that are based on Amount.

Now, clear the check boxes related to the R750 and R250 bicycles.

The chart now contains just the following six series lines, so that can you more easily compare trends for the M200 and T1000 bicycles.

M200 Europe: Quantity

M200 North America: Quantity

M200 Pacific: Quantity

T1000 Europe: Quantity t

T1000 North America: Quantity

T1000 Pacific: Quantity

The chart displays both historical and predicted data. Predicted data is shaded to differentiate it from historical data. To make it easier to compare different series, you can also change the colors associated with each line in the graph. For more information, see How to: Change the Colors Used in the Data Mining Viewer.

The trend lines show that total sales for all regions are generally increasing, with a peak every 12 months in December. The predictions generally continue in this trend. The chart also displays the data for the T1000 bicycle starts much later than the data for the other product series.

By default, Analysis Services shows five prediction steps for each time series. You can change this value to view more or fewer prediction steps. You can also graphically view the standard deviation for the prediction by adding error bars to the chart.

To change prediction and display options in the Chart view

Change the value for Prediction Steps gradually from 5 to 10, and then back to 6.

Note

In trend lines with large fluctuations in the historical data, the fluctuations are amplified during prediction.

Select the Show Deviations check box.

Pause the mouse over the error bars for the M200 series.

Pause the mouse over the error bars for the T1000 Pacific series.

You will use these results to investigate further. Later, you will develop a model that is averaged across all regions and, therefore, not subject to as much fluctuation.

The Model tab of the Microsoft Time Series Viewer in Data Mining Designer enables you to view the time series as a decision tree graph. A separate tree is computed for each series that you included in the model. In a time series model, the decision tree graph might have a single node, if the time series is linear, or it might have several nodes and conditions associated with each branch, just like a regular decision tree.

The nodes in the decision tree graph for a time series contain some of the following information:

The concentration of cases for the state of the predictable attribute that is specified in the Background control. Both the Node Legend window and the ToolTip that appears when you pause the mouse over an object in the tree give the exact number of cases.

The regression formula for the node. The ARTXP regression formula is available only in the leaf nodes. The ARIMA equation is available in the root node of the tree.

A diamond chart that represents the range of the attribute. The diamond is located at the mean for the node, and the width of the diamond represents the variance of the attribute at that node.

To view the decision tree for a time series model

In the Tree list on the Model tab of the viewer, select the M200 North America: Amount series.

A single node appears in the graph.

Pause the mouse cursor over the node.

For an All node, the ToolTip that appears includes information such as, the number of cases in the entire series, and time series equations derived from analysis of the data.

Click the node and view the Mining Legend.

The Mining Legend includes information similar to that in the ToolTip, but provides additional details, including a histogram of values.

In the Tree list on the Model tab of the viewer, select the M200 Pacific: Amount series.

The tree graph now contains an All node and two child nodes. The text in the child nodes describes the conditions that split the tree,

Pause the mouse cursor over one of the child nodes and review the contents of the ToolTip. Alternatively, click the node and view the Mining Legend.

For child nodes, the description includes the count of cases in each branch of the tree, and any additional conditions that caused the tree to split.

In addition to the custom viewer for time series, Analysis Services provides the MicrosoftGeneric Content Tree Viewer. This viewer can be used for all data mining models, regardless of the algorithm that you used. The Generic Content Tree Viewer is available from the Viewer drop-down list.

In this viewer, each mining model, regardless of the data or the algorithm used for analysis, is represented as a tree that contains a series of nodes. Each node represents information about some subset of the data. The exact content of the node differs depending on the algorithm and the type of the predictable attribute, but the general schema of the content is the same.

Because the data mining model that you created is a mixed model, combining both ARTXP and ARIMA algorithms, Analysis Services uses each algorithm to create separate ARTXP and ARIMA models for each time series. When you use the Microsoft Time Series Viewer for browsing the forecasting mining model, Analysis Services combines the results of the two algorithms and displays the mining model as a single tree, with each node of the tree containing some content from both algorithms.

However, when you use the MicrosoftGeneric Content Tree Viewer, the content generated by each algorithm is exposed as two different types of nodes within the forecasting mining model. You can drill down through either the ARTXP version of the model or the ARIMA version of the model to see increasing levels of detail.

To view details for a particular data series in the generic content viewer

This value shows you which series, or combination of product and region, is contained in this node. In the AdventureWorks example, the topmost node is for the M200 Europe series.

In the Node Caption pane, locate the first node that has child nodes.

If a series node has children, the tree view that appears on the Model tab of the Microsoft Time Series Viewer will also have a branching structure.

Expand the node and click one of the child nodes.

The NODE_DESCRIPTION column of the schema contains the condition that caused the tree to split.

In the Node Caption pane, click the topmost ARIMA node, and expand the node until all child nodes are visible.

In the Node Details pane, view the value for ATTRIBUTE_NAME.

This value tells you which time series is contained in this node. The topmost node in the ARIMA section should match the topmost node in the (All) section. In the AdventureWorks example, this node contains the ARIMA analysis for the series, M200 Europe.