Data Mining Extensions (DMX) Reference

Data Mining Extensions (DMX) is a language that you can use to create and work with data mining models in Microsoft SQL Server Analysis Services. You can use DMX to create the structure of new data mining models, to train these models, and to browse, manage, and predict against them. DMX is composed of data definition language (DDL) statements, data manipulation language (DML) statements, and functions and operators.

The specification defines the basis of data mining as the data mining model virtual object. The data mining model object encapsulates all that is known about a particular mining model. The data mining model object is structured like an SQL table, with columns, data types, and meta information that describe the model. This structure lets you use the DMX language, which is an extension of SQL, to create and work with models.

You can use DMX statements to create, process, delete, copy, browse, and predict against data mining models. There are two types of statements in DMX: data definition statements and data manipulation statements. You can use each type of statement to perform different kinds of tasks.

The following sections provide more information about working with DMX statements:

Data Definition Statements

Use data definition statements in DMX to create and define new mining structures and models, to import and export mining models and mining structures, and to drop existing models from a database. Data definition statements in DMX are part of the data definition language (DDL).

You can perform the following tasks with the data definition statements in DMX:

Export a mining model and associated mining structure to a file by using the EXPORT statement. Import a mining model and associated mining structure from a file that is created by the EXPORT statement by using the IMPORT statement.

Copy the structure of an existing mining model into a new model, and train it with the same data, by using the SELECT INTO statement.

Completely remove a mining model from a database by using the DROP MINING MODEL statement. Completely remove a mining structure and all its associated mining models from the database by using the DROP MINING STRUCTURE statement.

Data Manipulation Statements

Use data manipulation statements in DMX to work with existing mining models, to browse the models and to create predictions against them. Data manipulation statements in DMX are part of the data manipulation language (DML).

You can perform the following tasks with the data manipulation statements in DMX:

Train a mining model by using the INSERT INTO statement. This does not insert the actual source data into a data mining model object, but instead creates an abstraction that describes the mining model that the algorithm creates. The source query for an INSERT INTO statement is described in <source data query>.

Extend the SELECT statement to browse the information that is calculated during model training and stored in the data mining model, such as statistics of the source data. Following are the clauses that you can include to extend the power of the SELECT statement:

Create predictions that are based on an existing mining model by using the PREDICTION JOIN clause of the SELECT statement. The source query for a PREDICTION JOIN statement is described in <source data query>.

Remove all the trained data from a model or a structure by using the DELETE (DMX) statement.

DMX Query Fundamentals

The SELECT statement is the basis for most DMX queries. Depending on the clauses that you use with such statements, you can browse, copy, or predict against mining models. The prediction query uses a form of SELECT to create predictions based on existing mining models. Functions extend your ability to browse and query the mining models beyond the intrinsic capabilities of the data mining model.

You can use DMX functions to obtain information that is discovered during the training of your models, and to calculate new information. You can use these functions for many purposes, including to return statistics that describe the underlying data or the accuracy of a prediction, or to return an expanded explanation of a prediction.