Abstract

Background: Several years after sequencing the human genome and
the mouse genome, much remains to be discovered about the functions of
most human and mouse genes. Computational prediction of gene function
promises to help focus limited experimental resources on the most
likely hypotheses. Several algorithms using diverse genomic data have
been applied to this task in model organisms; however, the performance
of such approaches in mammals has not yet been evaluated.

Results: In this study, a standardized collection of mouse
functional genomic data was assembled; nine bioinformatics teams used
this data set to independently train classifiers and generate
predictions of function, as defined by Gene Ontology (GO) terms, for
21,603 mouse genes; and the best performing submissions were combined
in a single set of predictions. We identified strengths and weaknesses
of current functional genomic data sets and compared the performance
of function prediction algorithms. This analysis inferred functions
for 76% of mouse genes, including 5,000 currently uncharacterized
genes. At a recall rate of 20%, a unified set of predictions averaged
41% precision, with 26% of GO terms achieving a precision better than
90%.

Conclusion:
We performed a systematic evaluation of diverse, independently
developed computational approaches for predicting gene function from
heterogeneous data sources in mammals. The results show that currently
available data for mammals allows predictions with both breadth and
accuracy. Importantly, many highly novel predictions emerge for the
38% of mouse genes that remain uncharacterized.