Medicine has traditionally relied on heuristic approaches in which knowledge is acquired through experience and self-learning. Pathology is information rich with quantitative and qualitative measurements such as history, images, and physiological data from which diagnosis and treatment decisions are made. This information is readily linked to patient outcome data and is therefore potentially invaluable in improving treatment. Thus, pathology is ripe for the application of tools that can effectively turn this information into wisdom.
Machine learning (ML), which is a core branch of artificial intelligence, is one of the tools that can take information from many sources to aid decision-making processes. It's a system that takes in data, finds patterns, trains itself using the data, and produces an outcome. It uses algorithms and statistical models to perform a specific task effectively without using explicit instructions, relying on patterns and inference instead. Essentially, an ML algorithm is used to create a model from a data set, from which predictions are made. The data used to develop the models are divided into a training set and a testing set. The training set is used to learn different predictive models, each characterized by a set of parameters, including a penalty/complexity parameter. The purpose of the model selection set is to assess the predictive accuracy of each model on blinded data to select the model with the lowest generalization error. This optimum model is the most likely to be true.
Data mining uses many ML methods but with different goals. ML differs from data mining in that although both use the same methods and overlap substantially, ML focuses on prediction with known properties learned from the training data, whereas data mining focuses on the discovery of (previously) unknown properties in the data (this is the analysis step of knowledge discovery in databases). The algorithms …