Academic and research departments

Biography

Terry Windeatt received the BSc degree in Applied Science from University of Sussex, UK followed by M.Sc. in Electronic Engineering from University of California, B.A.(CNAA) in Theology and PhD degree from University of Surrey, U.K. After lecturing in Control Engineering at Kingston University, UK, he went to live and work in the USA for eight years. He worked on Intelligent Systems in the Research and Development Departments of General Motors and Xerox Corporation in Rochester, NY. His industrial R&D experience is in modelling/simulation for intelligent automotive and office-copying applications. He worked on early versions of closed loop control systems for car emissions and xerographic process. He returned from the United States to join the Department of Electrical and Electronic Engineering at the University of Surrey, where he now lectures in Machine Intelligence. He has worked on various research projects in the Centre for Vision, Speech and Signal Processing in the areas of Pattern Recognition, Neural Nets and Computer Vision.

My teaching

eee3005 Control Engineering

eeem005 AI and AI Programming

My publications

Publications

Facial action unit (au) classification is an approach to face expression
recognition that decouples the recognition of expression from individual actions. In this
paper, upper face aus are classified using an ensemble of MLP (Multi-layer perceptron)
base classifiers with feature ranking based on PCA components. This approach is
compared experimentally with other popular feature-ranking methods applied to Gabor
features. Experimental results on Cohn-Kanade database demonstrate that the MLP
ensemble is relatively insensitive to the feature-ranking method but optimized PCA
features achieve lowest error rate. When posed as a multi-class problem using Error-
Correcting-Output-Coding (ECOC), error rates are comparable to two-class problems
(one-versus-rest) when the number of features and base classifier are optimized.

By performing experiments on publicly available multi-class datasets we examine the effect of bootstrapping on the bias/variance behaviour of error-correcting output code ensembles. We present evidence to show that the general trend is for bootstrapping to reduce variance but to slightly increase bias error. This generally leads to an improvement in the lowest attainable ensemble error, however this is not always the case and bootstrapping appears to be most useful on datasets where the non-bootstrapped ensemble classifier is prone to overfitting.

There are a variety of methods for inducing predictive systems from
observed data. Many of these methods fall into the field of study of
machine learning. Some of the most effective algorithms in this domain
succeed by combining a number of distinct predictive elements to form
what can be described as a type of committee. Well known examples of
such algorithms are AdaBoost, bagging and random forests. Stochastic
discrimination is a committee-forming algorithm that attempts to combine
a large number of relatively simple predictive elements in an effort to
achieve a high degree of accuracy. A key element of the success of this
technique is that its coverage of the observed feature space should be
uniform in nature. We introduce a new uniformity enforcement method,
which on benchmark datasets, leads to greater predictive efficiency than
the currently published method.

Within the context face expression classification using the facial action coding system (FACS), we address the problem of detecting facial action units (AUs). The method adopted is to train a single error-correcting output code (ECOC) multiclass classifier to estimate the probabilities that each one of several commonly occurring AU groups is present in the probe image. Platt scaling is used to calibrate the ECOC outputs to probabilities and appropriate sums of these probabilities are taken to obtain a separate probability for each AU individually. Feature extraction is performed by generating a large number of local binary pattern (LBP) features and then selecting from these using fast correlation-based filtering (FCBF). The bias and variance properties of the classifier are measured and we show that both these sources of error can be reduced by enhancing ECOC through the application of bootstrapping and class-separability weighting.

The difficulties of tuning parameters of multilayer perceptrons (MLP) classifiers are well known. In this paper, a measure is described that is capable of predicting the number of classifier training epochs for achieving optimal performance in an ensemble of MLP classifiers. The measure is computed between pairs of patterns on the training data and is based on a spectral representation of a Boolean function. This representation characterizes the mapping from classifier decisions to target label and allows accuracy and diversity to be incorporated within a single measure. Results on many benchmark problems, including the Olivetti Research Laboratory (ORL) face database demonstrate that the measure is well correlated with base-classifier test error, and may be used to predict the optimal number of training epochs. While correlation with ensemble test error is not quite as strong, it is shown in this paper that the measure may be used to predict number of epochs for optimal ensemble performance. Although the technique is only applicable to two-class problems, it is extended here to multiclass through output coding. For the output-coding technique, a random code matrix is shown to give better performance than one-per-class code, even when the base classifier is well-tuned.

We outline a design for a FACS-based facial expression recognition system and describe in more detail the implementation of two of its main components. Firstly we look at how features that are useful from a pattern analysis point of view can be extracted from a raw input image. We show that good results can be obtained by using the method of local binary patterns (LPB) to generate a large number of candidate features and then selecting from them using fast correlation-based filtering (FCBF). Secondly we show how Platt scaling can be used to improve the performance of an error-correcting output code (ECOC) classifier.

We outline a design for a FACS-based facial expression recognition system and describe in more
detail the implementation of two of its main components. Firstly we look at how features that are
useful from a pattern analysis point of view can be extracted from a raw input image. We show
that good results can be obtained by using the method of local binary patterns (LPB) to generate
a large number of candidate features and then selecting from them using fast correlation-based
ltering (FCBF). Secondly we show how Platt scaling can be used to improve the performance of
an error-correcting output code (ECOC) classi er.

One of the methods used to evaluate the performance of ensemble classifiers
is bias and variance analysis. In this chapter, we analyse bootstrap aggregating
(bagging) and Error Correcting Output Coding (ECOC) ensembles using a biasvariance
framework; and make comparisons with single classifiers, while having
Neural Networks (NNs) as base classifiers. As the performance of the ensembles
depends on the individual base classifiers, it is important to understand the overall
trends when the parameters of the base classifiers -nodes and epochs for NNs-, are
changed.We show experimentally on 5 artificial and 4 UCI MLR datasets that there
are some clear trends in the analysis that should be taken into consideration while
designing NN classifier systems.

PC and TPDA algorithms are robust and well known prototype algorithms,
incorporating constraint-based approaches for causal discovery. However, both algorithms
cannot scale up to deal with high dimensional data, that is more than few
hundred features. This chapter presents hybrid correlation and causal feature selection
for ensemble classifiers to deal with this problem. Redundant features are
removed by correlation-based feature selection and then irrelevant features are eliminated
by causal feature selection. The number of eliminated features, accuracy, the
area under the receiver operating characteristic curve (AUC) and false negative rate
(FNR) of proposed algorithms are compared with correlation-based feature selection
(FCBF and CFS) and causal based feature selection algorithms (PC, TPDA,
GS, IAMB).

A feature ranking scheme for multilayer perceptron (MLP) ensembles is proposed, along with a stopping criterion based upon the out-of-bootstrap estimate. To solve multi-class problems feature ranking is combined with modified error-correcting output coding. Experimental results on benchmark data demonstrate the versatility of the MLP base classifier in removing irrelevant features.

This paper concentrates on the comparisons of systems that
are used for the recognition of expressions generated by six upper face
action units (AUs) by using Facial Action Coding System (FACS). Haar
wavelet, Haar-Like and Gabor wavelet coe cients are compared, using
Adaboost for feature selection. The binary classi cation results by using
Support Vector Machines (SVM) for the upper face AUs have been observed
to be better than the current results in the literature, for example
96.5% for AU2 and 97.6% for AU5. In multi-class classi cation case, the
Error Correcting Output Coding (ECOC) has been applied. Although
for a large number of classes, the results are not as accurate as the binary
case, ECOC has the advantage of solving all problems simultaneously;
and for large numbers of training samples and small number of classes,
error rates are improved.

Error Correcting Output Coding (ECOC) is a multiclass classification technique, in which multiple base classifiers (dichotomizers) are trained using subsets of the training data, determined by a preset code matrix. While it is one of the best solutions to multiclass problems, ECOC is suboptimal, as the code matrix and the base classifiers are not learned simultaneously. In this paper, we show an iterative update algorithm that reduces this decoupling. We compare the algorithm with the standard ECOC approach, using Neural Networks (NNs) as the base classifiers, and show that it improves the accuracy for some well-known data sets under different settings.

PC and TPDA algorithms are robust and well known prototype
algorithms, incorporating constraint-based approaches for causal
discovery. However, both algorithms cannot scale up to deal with high
dimensional data, that is more than few hundred features. This paper
presents hybrid correlation and causal feature selection for ensemble classifiers
to deal with this problem. The number of eliminated features, accuracy,
the area under the receiver operating characteristic curve (AUC)
and false negative rate (FNR) of proposed algorithms are compared with
correlation-based feature selection (FCBF and CFS) and causal based
feature selection algorithms (PC, TPDA, GS, IAMB).

Within the context of facial expression classification using the facial action coding system (FACS), we address the problem of detecting facial action units (AUs). Feature extraction is performed by generating a large number of multi-resolution local binary pattern (MLBP) features and then selecting from these using fast correlation-based filtering (FCBF). The need for a classifier per AU is avoided by training a single error-correcting output code (ECOC) multi-class classifier to generate occurrence scores for each of several AU groups. A novel weighted decoding scheme is proposed with the weights computed using first order Walsh coefficients. Platt scaling is used to calibrate the ECOC scores to probabilities and appropriate sums are taken to obtain separate probability estimates for each AU individually. The bias and variance properties of the classifier are measured and we show that both these sources of error can be reduced by enhancing ECOC through bootstrapping and weighted decoding.

There are two approaches to automating the task of facial expression recognition, the first concentrating on what meaning is conveyed by facial expression and the second on categorising deformation and motion into visual classes. The latter approach has the advantage that the interpretation of facial expression is decoupled from individual actions as in FACS (Facial Action Coding System). In this chapter, upper face action units (aus) are classified using an ensemble of MLP base classifiers with feature ranking based on PCA components. When posed as a multi-class problem using Error-Correcting-Output-Coding (ECOC), experimental results on Cohn-Kanade database demonstrate that error rates comparable to two-class problems (one-versus-rest) may be obtained. The ECOC coding and decoding strategies are discussed in detail, and a novel weighted decoding approach is shown to outperform conventional ECOC decoding. Furthermore, base classifiers are tuned using the ensemble Out-of-Bootstrap estimate, for which purpose, ECOC decoding is modified. The error rates obtained for six upper face aus around the eyes are believed to be among the best for this database.

Within the context face expression classification using the facial action
coding system (FACS), we address the problem of detecting facial action units
(AUs). The method adopted is to train a single error-correcting output code (ECOC)
multiclass classifier to estimate the probabilities that each one of several commonly
occurring AU groups is present in the probe image. Platt scaling is used to calibrate
the ECOC outputs to probabilities and appropriate sums of these probabilities are
taken to obtain a separate probability for each AU individually. Feature extraction
is performed by generating a large number of local binary pattern (LBP) features
and then selecting from these using fast correlation-based filtering (FCBF). The
bias and variance properties of the classifier are measured and we show that both
these sources of error can be reduced by enhancing ECOC through the application
of bootstrapping and class-separability weighting.

Within the context face expression classication using the facial action coding system (FACS), we address the problem of detecting facial action units (AUs). The method adopted is to train a single error-correcting output code (ECOC) multiclass classier to estimate the probabilities that each one of several commonly occurring AU groups is present in the probe image. Platt scaling is used to calibrate the ECOC outputs to probabilities and appropriate sums of these probabilities are taken to obtain a separate probability for each AU individually. Feature extraction is performed by generating a large number of local binary pattern (LBP) features and then selecting from these using fast correlation-based ltering (FCBF). The bias and variance properties of the classifier are measured and we show that both these sources of error can be reduced by enhancing ECOC through the application of bootstrapping and class-separability weighting.

We compare experimentally the performance of three approaches to ensemble-based classification on general multi-class datasets. These are the methods of random forest, error-correcting output codes (ECOC) and ECOC enhanced by the use of bootstrapping and class-separability weighting (ECOC-BW). These experiments suggest that ECOC-BW yields better generalisation performance than either random forest or unmodified ECOC. A bias-variance analysis indicates that ECOC benefits from reduced bias, when compared to random forest, and that ECOC-BW benefits additionally from reduced variance. One disadvantage of ECOC-based algorithms, however, when compared with random forest, is that they impose a greater computational demand leading to longer training times.

Existing ensemble pruning algorithms in the literature have mainly been defined for unweighted or weighted voting ensembles, whose extensions to the Error Correcting Output Coding (ECOC) framework is not successful. This paper presents a novel pruning algorithm to be used in the pruning of ECOC, via using a new accuracy measure together with diversity and Hamming distance information. The results show that the novel method outperforms those existing in the state-of-the-art.

Two-class supervised learning in the context of a classifier ensemble may be formulated as learning an incompletely specified Boolean function, and the associated Walsh coefficients can be estimated without knowledge of the unspecified patterns. Using an extended version of the Tumer-Ghosh model, the relationship between Added Classification Error and second order Walsh coefficients is established. In this paper, the ensemble is composed of Multi-layer Perceptron (MLP) base classifiers, with the number of hidden nodes and epochs systematically varied. Experiments demonstrate that the mean second order coefficients peak at the same number of training epochs as ensemble test error reaches a minimum.

High dimensional data can lead to low accuracy of classification and take a long time to calculate because it contains irrelevant features and redundant features. To overcome this problem, dimension of data has to be reduced. Causal feature selection is one of methods for feature reduction but it cannot identify redundant features. This paper presents Parent-Children based for Causal Redundant Feature Identification (PCRF) algorithm to identify and remove redundant features. The accuracy of classification and number of feature reduced by PCRF algorithm are compared with correlation feature selection. According to the results, PCRF algorithm can identify redundant feature but has lower accuracy of classification than correlation feature selection.

An approach to approximating the decision boundary
of an ensemble of two-class classifiers is proposed.
Spectral
coefficients are used to approximate the discrete probability density
function of a Boolean Function. It is shown that the difference
between first and third order coefficient approximation is a good
indicator of optimal base classifier complexity. A theoretical analysis
is supported by experimental results on a variety of Artificial and
Real two-class problems.

A spectral analysis of a Boolean function is proposed for ap-
proximating the decision boundary of an ensemble of classifiers, and an in-
tuitive explanation of computing Walsh coefficients for the functional ap-
proximation is provided. It is shown that the difference between first and
third order coefficient approximation is a good indicator of optimal base
classifier complexity. When combining Neural Networks, experimental re-
sults on a variety of artificial and real two-class problems demonstrate un-
der what circumstances ensemble performance can be improved. For tuned
base classifiers, first order coefficients provide performance similar to ma-
jority vote. However, for weak/fast base classifiers, higher order coefficient
approximation may give better performance. It is also shown that higher
order coefficient approximation is superior to the Adaboost logarithmic
weighting rule when boosting weak Decision Tree base classifiers.

To improve the performance of the computer-aided
systems for breast cancer diagnosis, the ensemble classifier is
proposed for classifying the histological structures in the breast
cancer microscopic images into three region types: positive
cancer cells, negative cancer cells and non-cancer cell (stromal
cells and lymphocyte cells) image. The bagging and boosting
ensemble techniques are used with the decision tree (DT) learner.
They are also compared with the single classifier, DT. The
feature used as an input of classifiers is the fractal dimension
(FD) based 12 color channels. It is computed from the image
datasets, which are manually prepared in small cropped image
with 3 window sizes including 128×128 pixels, 192×192 pixels and
256×256 pixels. The results show that the boosting ensemble
classifier gives the best accuracy about 80% from window size of
256, although it is the lowest when using the single DT as
classifier. The results indicated that the ensemble method is
capable of improving the accuracy in the classification compared
to the single classifier. The classification model using FD and the
ensemble classifier would be applied to develop the computer-
aided systems for breast cancer diagnosis in the future.

Error Correcting Output Coding (ECOC) is a multi-
class classification technique in which multiple binary classifiers
are trained according to a preset code matrix such that each one
learns a separate dichotomy of the classes. While ECOC is one of
the best solutions for multi-class problems, one issue which makes
it suboptimal is that the training of the base classifiers is done
independently of the generation of the code matrix.
In this paper, we propose to modify a given ECOC matrix
to improve its performance by reducing this decoupling. The
proposed algorithm uses beam search to iteratively modify the
original matrix, using validation accuracy as a guide. It does not
involve further training of the classifiers and can be applied to
any ECOC matrix.
We evaluate the accuracy of the proposed algorithm (BeamE-
COC) using 10-fold cross-validation experiments on 6 UCI
datasets, using random code matrices of different sizes, and base
classifiers of different strengths. Compared to the random ECOC
approach, BeamECOC increases the average cross-validation
accuracy in
83
:
3%
of the experimental settings involving all
datasets, and gives better results than the state-of-the-art in
75%
of the scenarios. By employing BeamECOC, it is also possible to
reduce the number of columns of a random matrix down to
13%
and still obtain comparable or even better results at times.