[SciPy-user] kmeans2 random initialization

On 31-Mar-08, at 11:17 PM, Robert Kern wrote:
> The relevant function is scipy/cluster/vq.py:_krandinit(). It is
> finding the covariance matrix and manually doing a multivariate normal
> sampling. Your data is most likely degenerate and not of full rank.
> It's arguable whether or not this should fail, but
> numpy.random.multivariate_normal() uses the SVD instead of a Cholesky
> decomposition to find the matrix square root, so it sort of ignores
> non-positive definiteness.
This might not be relevant, depending on how the covariance is
computed, but one 'gotcha' I've seen with numerical algorithms that
assume positive-definiteness is that occasionally floating point
oddities will induce (very slight) non-symmetry of the input matrix,
and thus the algorithm will choke; it's easily solved by averaging the
matrix with it's transpose (though there are probably more efficient
ways).
David