This thesis expands upon an existing noise cleansing technique, polishing, enabling it to be used in the Software Quality Prediction domain, as well as any other domain where the data contains continuous values, as opposed to categorical data for which the technique was originally designed. The procedure is applied to a real world dataset with real (as opposed to injected) noise as determined by an expert in the domain. This, in combination with expert assessment of the changes made to the... Show moreThis thesis expands upon an existing noise cleansing technique, polishing, enabling it to be used in the Software Quality Prediction domain, as well as any other domain where the data contains continuous values, as opposed to categorical data for which the technique was originally designed. The procedure is applied to a real world dataset with real (as opposed to injected) noise as determined by an expert in the domain. This, in combination with expert assessment of the changes made to the data, provides not only a more realistic dataset than one in which the noise (or even the entire dataset) is artificial, but also a better understanding of whether the procedure is successful in cleansing the data. Lastly, this thesis provides a more in-depth view of the process than previously available, in that it gives results for different parameters and classifier building techniques. This allows the reader to gain a better understanding of the significance of both model generation and parameter selection. Show less

This thesis presents a noise handling technique that attempts to improve the quality of training data for classification purposes by eliminating instances that are likely to be noise. Our approach uses twenty five different classification techniques to create an ensemble of classifiers that acts as a noise filter on real-world software measurement datasets. Using a relatively large number of base-level classifiers for the ensemble-classifier filter facilitates in achieving the desired level... Show moreThis thesis presents a noise handling technique that attempts to improve the quality of training data for classification purposes by eliminating instances that are likely to be noise. Our approach uses twenty five different classification techniques to create an ensemble of classifiers that acts as a noise filter on real-world software measurement datasets. Using a relatively large number of base-level classifiers for the ensemble-classifier filter facilitates in achieving the desired level of noise removal conservativeness with several possible levels of filtering. It also provides a higher degree of confidence in the noise elimination procedure as the results are less likely to get influenced by (possible) inappropriate learning bias of a few algorithms with twenty five base-level classifiers than with a relatively smaller number of base-level classifiers. Empirical case studies of two different high assurance software projects demonstrate the effectiveness of our noise elimination approach by the significant improvement achieved in classification accuracies at various levels of filtering. Show less