Abstract

Clustering is a major tool in data analysis, dividing objects into different groups, based on unsupervised training procedures. Clustering algorithms attempt to group a set of objects into well-defined subgroups, based on some similarity between them. The results of the clustering process may not be confirmed by our knowledge of the data. The self-organizing map (SOM) neural network is an excellent tool in recognizing clusters of data, relating similar classes to each other in an unsupervised manner. Basically, SOM is used when the training dataset contains cases featuring input variables without the associated outputs. SOM can also be used for classification when output classes are immediately available; the advantage in this case is its ability to highlight similarities between classes, thus assessing different previous classification approaches. This paper explores the above ability of SOM to validate length of stay-based (LOS) clustering results that obtained using Gaussian mixture modeling (GMM) approach, by comparing the classification accuracy (percentage of samples correctly classified) of different results. The idea behind this attempt is the following: in the first step, each GMM approach provides its own scheme of grouping LOS, and different classes are thus recognized and labeled. In this step, we have considered GMM with different LOS intervals. In the second step, SOM will first learn to recognize clusters of data and, secondly, will compare its clusters map with the previous labeled clusters provided by GMM. To conclude, a closer similarity between previous clustering schemes and SOM clusters map, will results in a better accuracy for clustering LOS data. Ultimately, by comparing different GMM component models, the SOM application will lead to an optimal number of patient groups. An application to a surgical dataset showed the effectiveness of this methodology in determining the LOS intervals.