There now exist many biological databases containing enormous quantities of
entries of genes and gene products along with descriptions and data about a wide variety of their
functional properties. However, the synonymy and polysemy of the descriptive terms and the lack
of explicit relationships among them hampers consistent, reliable querying of and interoperability
between these databases. In response to this, the Gene Ontology (GO), a structured controlled
vocabulary of nearly 17,000 terms, has been (and is being) developed to be used to functionally
describe the gene products of various organisms, for which it is becoming the de facto
standard. GO is divided into three subontologies of terms (most of which also have
natural-language definitions) which may be used to annotate gene products in terms of the
molecular functions they possess, the higher-level biological processes in which they are involved,
and the cellular locations in which they are active. Each term of each of these subontologies is
related to each respective parent term via an is-a or a part-of
relationship.

GO has been a success in that its terms are being used to functionally annotate
genes and gene products in a number of prominent biological databases. However, as GO continues
to increase in size, users find it increasingly difficult to find the terms they wish to use for
annotation. Furthermore, although a large vocabulary is provided, the terms have no links to each
other apart from those relationships that form the three taxonomic/partonomic hierarchies. Thus,
beyond this hierarchical information, there are no constraints within GO that can be used to
indicate which terms should or should not be used together in the annotation of a given gene
product. It is possible (though unlikely) that an annotator, in describing a protein, could
associate the terms "viral life cycle", "amino-acid biosynthesis", and "extracellular matrix" to
that protein; it is more likely that he would accidentally do so. In either case, this is likely
to be biologically nonsensical. Good annotation relies upon the domain expertise of the annotator
and the usability of the annotation tool. We seek to improve upon the latter by creating formal
relationships between pairs of GO terms (as well as between GO terms and gene-product types) mined
from biological databases and building an application that, relying upon these relationships, can
dynamically retrieve and present those GO terms that are most likely to be applicable for a given
gene product based on the GO terms and the gene-product type already entered by the user for that
gene product. Thus, if an annotator has already selected “viral life cycle” as a
biological-process term and then indicated that she wanted to add a molecular-function term, she
would be presented with those molecular-function terms that have been used as annotating terms
along with “viral life cycle” (as well as those terms’ descendants).