Gaussian Mixture Models

Gaussian mixture models are formed by combining
multivariate normal density components In Statistics and Machine Learning Toolbox™ software,
use the gmdistribution class
to fit data using an expectation maximization (EM) algorithm, which assigns
posterior probabilities to each component density with respect to
each observation. The fitting method uses an iterative algorithm that
converges to a local optimum.

Clustering using Gaussian mixture models is sometimes considered
a soft clustering method. The posterior probabilities for each point
indicate that each data point has some probability of belonging to
each cluster. For more information on clustering with Gaussian mixture
models, see Clustering Using Gaussian Mixture Models. This section describes
their creation.

Creating Gaussian Mixture Models

Specifying a Model

Use the gmdistribution constructor
to create Gaussian mixture models with specified means, covariances,
and mixture proportions.

The gmdistribution reference
page describes these properties. To access the value of a property,
use dot indexing. For example, access the dimensions of the object.

dimension = obj.NDimensions

dimension =
2

Access the distribution name.

name = obj.DistName

name =
gaussian mixture distribution

Use the methods pdf and cdf to compute values and visualize the
object:

figure
ezsurf(@(x,y)pdf(obj,[x y]),[-10 10],[-10 10])

figure
ezsurf(@(x,y)cdf(obj,[x y]),[-10 10],[-10 10])

Fitting a Model to Data

You can also create Gaussian mixture models by fitting a parametric
model with a specified number of components to data. fitgmdist uses the syntax obj
= fitgmdist(X,k), where X is a data matrix
and k is the specified number of components. Choosing
a suitable number of components k is essential
for creating a useful model of the data—too few components
fails to model the data accurately; too many components leads to an
over-fit model with singular covariance matrices.

The following example illustrates this approach.

First, create some data from a mixture of two bivariate Gaussian
distributions using the mvnrnd function:

Both the Akaike and Bayes information are negative log-likelihoods
for the data with penalty terms for the number of estimated parameters.
You can use them to determine an appropriate number of components
for a model when the number of components is unspecified.