This is the end of the preview.
Sign up
to
access the rest of the document.

Unformatted text preview: resent persons who have defaulted on their
loans and (2) the o’s represent persons whose
loans are in good status with the bank. Thus,
this simple artiﬁcial data set could represent a
historical data set that can contain useful
knowledge from the point of view of the
bank making the loans. Note that in actual
KDD applications, there are typically many
more dimensions (as many as several hundreds) and many more data points (many
thousands or even millions). FALL 1996 43 Articles Debt The purpose here is to illustrate basic ideas
on a small problem in two-dimensional
space. o
No Loan o x
o
x x o
x
o o
x
x
x o o
x x o Loan o o x Data-Mining Methods o o
Income Figure 3. A Simple Linear Classiﬁcation Boundary for the Loan Data Set.
The shaped region denotes class no loan. Debt o
x x x
x
x o
o x
o o o o
x x o
o x io
ess gr Re o x o o Income Figure 4. A Simple Linear Regression for the Loan Data Set. 44 AI MAGAZINE ine nL o The two high-level primary goals of data mining in practice tend to be prediction and description. As stated earlier, prediction involves using some variables or ﬁelds in the
database to predict unknown or future values
of other variables of interest, and description
focuses on ﬁnding human-interpretable patterns describing the data. Although the
boundaries between prediction and description are not sharp (some of the predictive
models can be descriptive, to the degree that
they are understandable, and vice versa), the
distinction is useful for understanding the
overall discovery goal. The relative importance of prediction and description for particular data-mining applications can vary considerably. The goals of prediction and
description can be achieved using a variety of
particular data-mining methods.
C lassiﬁcation is learning a function that
maps (classiﬁes) a data item into one of several predeﬁned classes (Weiss and Kulikowski
1991; Hand 1981). Examples of classiﬁcation
methods used as part of knowledge discovery
applications include the classifying of trends
in ﬁnancial markets (Apte and Hong 1996)
and the automated identiﬁcation of objects of
interest in large image databases (Fayyad,
Djorgovski, and Weir 1996). Figure 3 shows a
simple partitioning of the loan data into two
class regions; note that it is not possible to
separate the classes perfectly using a linear
decision boundary. The bank might want to
use the classiﬁcation regions to automatically
decide whether future loan applicants will be
given a loan or not.
Regression is learning a function that maps
a data item to a real-valued prediction variable. Regression applications are many, for
example, predicting the amount of biomass
present in a forest given remotely sensed microwave measurements, estimating the probability that a patient will survive given the results of a set of diagnostic tests, predicting
consumer demand for a new product as a
function of advertising expenditure, and predicting time series where the input variables
can be time-lagged versions of the prediction
variable. Figure 4 shows the result of simple
linear regression where total debt is ﬁtted as a
linear function of income...
View
Full Document