Abstract

The expectation maximisation (EM) algorithm is an iterative maximum likelihood procedure often used for estimating the parameters of a mixture model. Theoretically, increases in the likelihood function are guaranteed as the algorithm iteratively improves upon previously derived parameter estimates. The algorithm is considered to converge when all parameter estimates become stable and no further improvements can be made to the likelihood value. However, to reduce computational time, it is often common practice for the algorithm to be stopped before complete convergence using heuristic approaches. In this paper, we consider various stopping criteria and evaluate their effect on fitting Gaussian mixture models (GMMs) to patient length of stay (LOS) data. Although the GMM can be successfully fitted to positively skewed data such as LOS, the fitting procedure often requires many iterations of the EM algorithm. To our knowledge, no previous study has evaluated the effect of different stopping criteria on fitting GMMs to skewed distributions. Hence, the aim of this paper is to evaluate the effect of various stopping criteria in order to select and justify their use within a patient spell classification methodology. Results illustrate that criteria based on the difference in the likelihood value and on the GMM parameters may not always be a good indicator for stopping the algorithm. In fact we show that the values of the difference in the variance parameters should be used instead, as these parameters are the last to stabilise. In addition, we also specify threshold values for the other stopping criteria.