Abstract:
Record and sample of large size in data mining requires the outlaying observation detection (Outlier Detection (OD)),since they carry the necessary information. The large size and diversity introduces the limitations in outlier detection techniques. The classifiers involved in machine learning algorithms deteriorate the OD performance, since they are sensitive to noise, irrelevant features. This paper discusses the influence of Triangular Boundary-based Classification (TBC) and the Weighing Based Feature Selection and Monotonic Classification (WFSMC) on the Wisconsin Diagnosis Breast Cancer (WDBC) dataset for an effective outlier prediction. The imputation, weight computation, and ordinal feature selection prior to TBC predicts the relevant features. The normal distribution function-based triangular area support boundary region analysis to provide treatment or precautions to the patients. The points nearer to the boundary region lead to misclassification. Hence, the inclusion of monotonic constraints in the classification phase improves their accuracy. The comparative analysis between the TBC and WFSMC regarding accuracy, precision, and recall proves the effectiveness of WFSMC in real-time data mining applications.
Keywords:Data mining, Monotonic classification, Ordinal Feature selection, Outlier Detection, Weight Computation, Wisconsin Diagnosis Breast Cancer (WDBC).