SRI International's Bioinformatics Research Group previously performed an analysis that determined that no gene or protein sequence is known for more than 1400 enzymatic activities, corresponding to 38% of enzyme classification (E.C.) numbers (read more by clicking here).

To understand this abundance of "orphan activities" lacking sequence information, tbe BRG performed a preliminary survey of orphans, with three goals:

Authenticate orphan activities: Are orphans the result of annotation artifacts, or do they reflect an actual lack of sequence data for these activities?

Capture and disseminate data: Is there information in the published literature that enables the identification of genes coding for these orphans?

Submit identified sequence data: When artifacts are found (for example, an orphan turns out to have sequence data available somewhere in the literature), relevant databases such as UniProt should be updated.

Why

Understanding orphan activities: Determining how often orphans are artifacts helps us design a logical approach to the identification of a cognate gene for each orphan activity. In addition, acquiring molecular properties (such as experimentally determined molecular weights and isoelectric points) and other data from the literature associated with each orphan will help develop methods for identifying their genes.

Assisting genome annotation: As long as an activity has no associated sequence data, it will never be predicted in any newly sequenced genome. Associating orphan activities with sequence data will help resolve the continuing problem of many genes in new genomes failing to acquire any functional assignment.

Enhanced pathway prediction: Providing sequence information for orphans will also increase our ability to computationally predict the metabolic complement of new organisms, since such preedictions typically rely upon sequence data from known enzymatic activities.

Enhanced metabolic engineering:Sequence data will also enhance the practice of metabolic engineering, another field that depends on sequences from known enzymatic activities.