Abstract

Recently, Denoeux proposed a novel neural network classifier based on the Dempster-Shafer theory. Several of his preliminary experiments in some typical problems demonstrated that the classifier has an excellent performance when compared to other statistical and machine learning approaches. However, up to now there has been little further work reported pertaining to its improvements or applications. As a result, this research extends the initial work by examining its potential improvements and applicability in a new real world task such as the protein secondary structure prediction. In order to reduce the computational demand when training with large data of proteins, an interface was developed using the data parallel approach to parallelize the training phase of the classifier and other accompanying methods such as data clustering algorithms. The parallelized classifier also permitted rigorous experiments to be conducted in two other benchmark problems with disparate dimensions to determine the classifier's inherent attributes and drawbacks. The experiments showed that although the classifier performed better than some of the best methods such as Support Vector Machines and Kernel Fisher Discriminants in the small dimensional problem (dimension size = 9), its performance deteriorated significantly in the higher dimensional problem (dimension size = 60). This presented a substantial challenge because the secondary structure prediction exhibits high dimensionality as well. An improved version of the classifier was designed by introducing Multilayer Perceptrons to replace the distance measure of the classifier, which appeared to be impaired in high dimensions. The results of the secondary structure prediction demonstrated that the new classifier performed better than the original one. Moreover, at the level of sequence-to-structure prediction, its performance was comparable to the PHD (Profile network from Heidelberg) method, which is one of the best secondary structure prediction schemes.