If we wanted to learn these underlying species’ measurements, we would
use these real valued measurements and make assumptions about the
structure of the data.

In practice, real valued data is commonly assumed to be distributed
normally, or Gaussian

We could assume that conditioned on species, the measurement data
follwed a multivariate normal

\[P(\mathbf{x}|species=s)\sim\mathcal{N}(\mu_{s},\Sigma_{s})\]

The normal inverse-Wishart distribution allows us to learn the
underlying parameters of each normal distribution, its mean
\(\mu_s\) and its covariance \(\Sigma_s\). Since the normal
inverse-Wishart is the conjugate prior of the multivariate normal, the
posterior distribution of a multivariate normal with a normal
inverse-Wishart prior also follows a normal inverse-Wishart
distribution. This allows us to infer the distirbution over values of
\(\mu_s\) and \(\Sigma_{s}\) when we define our model.

Note that if we have only one real valued variable, the normal
inverse-Wishart distribution is often referred to as the normal
inverse-gamma distribution. In this case, we learn the scalar valued
mean \(\mu\) and variance \(\sigma^2\) for each inferred
cluster.

Univariate real data, however, should be modeled with our normal
invese-chi-squared distribution, which is optimized for infering
univariate parameters.