11.1 Clustering

Chapter 7 considered supervised learning, where the target features that must be predicted
from input features are observed in the
training data. In clustering orunsupervised learning, the
target features are not given in the training examples. The aim is to
construct a natural classification that can be used to cluster the data.

The general idea behind clustering is to partition the examples into
clusters orclasses. Each class
predicts feature values for the examples in the class. Each
clustering has a prediction error on the predictions. The best
clustering is the one that minimizes the error.

Example 11.1:
A diagnostic assistant may want to group the different treatments
into groups that predict the desirable and undesirable effects of the treatment. The assistant
may not want to give a patient a drug because similar drugs may have had
disastrous effects on similar patients.

An intelligent tutoring system may want to cluster students' learning
behavior so that strategies that work for one member of a class may
work for other members.

In hard clustering, each
example is placed definitively in a class. The class is then used to
predict the feature values of the example. The alternative to hard
clustering is soft clustering, in which each example has a
probability distribution over its class. The prediction of the values
for the
features of an example is the weighted average of the predictions of the
classes the example is in, weighted by the probability of the example being in the class.