Abstract

We introduce a new method of functionally classifying genes using gene
expression data from DNA microarray hybridization experiments. The
method is based on the theory of support vector machines (SVMs). SVMs
are considered a supervised computer learning method because they
exploit prior knowledge of gene function to identify unknown genes of
similar function from expression data. SVMs avoid several problems
associated with unsupervised clustering methods such as hierarchical
clustering and self-organizing maps. SVMs have many mathematical
features that make them attractive for gene expression analysis,
including their flexibility in choosing a similarity function,
sparseness of solution when dealing with large data sets, the ability
to handle large feature spaces, and the ability to identify outliers.
We test several SVMs that use different similarity metrics, as well as
some other supervised learning methods, and find that the SVMs best
identify sets of genes with a common function using expression data.
Finally, we use SVMs to predict functional roles for uncharacterized
yeast ORFs based on their expression data.