Ed Ramsden on his winning solution in SIAM SDM’11 Contest

The basis for my model was the Probabilistic Neural Network (PNN), which was originally introduced by Donald Specht in 1990. The PNN is an example-based classifier in which the ‘X’ vector for an unknown case to be classified is compared to all known-class cases used in the training set. A distance metric (typically Euclidean) is passed through a gaussian function to estimate a ‘probability’ of a match with each training case. These individual probabilities are combined for each class in the training set, and the class with the highest composite probability is selected as the most likely class for the unknown case. The evaluation and combination function used was:

Although a PNN can be used with little or no training, this problem posed several difficulties. The first was the high dimensionality of the input data. Because they are example based, PNN classifiers require their input ‘space’ to be reasonably well filled in order to perform. As the number of input features is increased, one would expect their input space to become exponentially sparser. The solution to this was to employ feature selection. Also, another challenge for obtaining good performance is the proper selection of σ, which controls the selectivity of the classifier. If one makes σ too large, the classifier will tend lose the ability to differentiate between different input data. On the other hand, if σ is made too small, the classifier loses the ability to generalize beyond its training set. The problems of both feature and σ selection were solved by using a guided random walk, with the objective of maximizing the Modified Youdon performance on the training set. One feature of this approach is that it does not require the calculation of gradient information, only the value of the metric being maximized. To avoid severe overtraining effects, a leave-one-out scheme was used to evaluate training-set performance.

Because the PNN model developed as described above only sees a small subset of available inputs, I decided to attempt to increase the performance through constructing ensembles of the PNNs, and then taking a simple vote among their outputs to decide the final classification.

As one can see from the following plot, there is substantial variation in both the training and final test Modified Youdon measures for different models, with a degree of correlation between the training metric and the final test metric. This led to the idea of constructing the final voting pool out of a subset of models with superior training performance.

In the end, the submission model consisted of a vote of the best 25 out of 135 candidate PNN models (by training score) constructed using 35 features. This yielded a training score of 0.794, and a final test score of 0.689. Note that while some individual sub-models would have had very similar performance to the ensemble model, there was no obvious way of reliably identifying such high-performing sub models a priori, so the ensemble technique allowed for the combination of a number of good (and not so good) models into a better one.

I developed the model generation code in Visual Basic .NET, and did the final vote taking using a spreadsheet. The generation and tuning of the 135 candidate models required nearly 8 hours on a single processor core of an Intel E5300.

3 Responses to Ed Ramsden on his winning solution in SIAM SDM’11 Contest

I wonder if Ed would share the balanced Youden index he was achieving with the training data. I did not have time to make an entry, but have developed my own approach as practice. I do not know how to judge my model. Did you find a large or small difference between the training result and the final result? Thanks for any info you can share. I know I am not overfitting my data. I am using a Bayesian approach with only a couple of ‘knobs’ to turn.

The final ‘training’ Youden score for my submitted model was 0.794, while the test score was 0.689, so there was a pretty significant difference. Because of the ensembling I did, the final training score came about as a result of the combination of sub-models that were trained to maximize their individual training Youden scores. The plot I put in my post gives a pretty good idea of the spreads between ‘train’ and ‘final test’ Youden scores for submodels of different dimension. I was only able to do this kind of post-analysis after the final result file was made public (finalDecisions.csv) and I knew what the real answers were.
ER