Abstract

Some classification algorithm in machine learning are based on induction and deduction, when facing to a batch of data, they are always trying to extract a general classification model for those data, and expect the classification model has higher prediction performance; But in high-risk and small sample fields, this classification model have some limitations:firstly, because of the limitation of the number of small sample data, this classification model not only didn’t have enough data used for extract a general rule, but also can not make full use of all the information of sample data; Secondly the results of classification prediction are just simple to us, and the results have no confidence or believability to measure.In this paper the research and application of conformal predictor based on transduction is aimed at this defect to launch. Transductive inference is directly to classify and recognize for the data sets with the whole sample information, and it doesn’t need to build classification model in the process of classification, its target is to explore the inner relation between tested sample data and sample sequence; The conformal predictor build a confidence mechanism through the Kolmogorov algorithm randomness theory, and use the detect function of randomness whose results are the reliability of prediction, to measure the confidence, and the form of results predicted by algorithm under the control of confidence are domain.The mainly part of conformal predictor is to design the function of nonconformity measure, which can be integrated classification algorithm in traditional machine learning. In this paper, firstly the conformal prediction algorithm based on k-nearest neighbor and support vector machine are discussed, and the conformal prediction algorithm based on k-nearest neighbor is improved, through using geodesic distance to replace Euclidean distance to measure the similarity between samples; Then the conformal prediction algorithm based on logistic regression model can be achieved; All algorithms in this paper are programmed on the platform of MatlabR2007b, and be used to the data sets from UCI and tobacco industry, and through the comparison of accuracy rate predicted by conformal prediction algorithm and other algorithm used usually to test the performance of conformal prediction algorithm.In the end, the thesis summarizes the research and indicates the future work.