Data mining in the health care industry

Abstract

This dissertation is about data mining in health care. The first manuscript is a case study of a pharmaceutical company's launch of a new drug. The second manuscript studies falls within a hospital. ^ The case study is of a pharmaceutical company's failed attempt to reach overly optimistic sales goals for its new drug launch resulted in several related findings. First, in addition to data mining being a useful tool for performing post-hoc market segmentation on data collected in rather large-scale segmentation studies, it was also found to have potential for making useful post-launch adjustments to a chosen segmentation strategy. That is, sales and related data collected soon after a product's launch can be analyzed to provide information on the degree to which the chosen segmentation method is providing expected results. If performance is not meeting expectations, data mining provides a powerful means of identifying alternative segmentation variables useful in enhancing performance in terms of better utilizing resources, lowering sales costs, and achieving greater market demand. The present case study demonstrates that a hierarchical data mining algorithm decision-making strategy was an extremely powerful tool for understanding the nature of the drug's market. ^ The second manuscript on falls within a hospital used data from Sarasota Memorial Hospital (SMH), we found that logistic regression had a lower prediction error than classification trees, and hierarchical data mining models confirmed that a self-organizing map (SOM) with logistic regression had a significantly lower prediction error than one with a classification tree. However, there was no significant difference between the single logistic regression model and the hierarchical data mining model using SOM and logistic regression. We introduced a novel recursive data mining ensemble that outperforms both the logistic regression single algorithm and the SOM logistic regression hierarchical algorithm approach. The ensemble was created by utilizing SOM and classification trees recursively and alternating nuggets between the two models. ^