LOCtree

LOCtree is a novel system of support vector machines (SVMs) that predict the subcellular localization of proteins, and DNA-binding propensity for nuclear proteins, by incorporating a hierarchical ontology of localization classes modeled onto biological processing pathways. Biological similarities are incorporated from the description of cellular components provided by the gene ontology consortium (GO). GO definitions have been simplified and tailored to the problem of protein sorting. Technically the ontology has been implemented using a decision tree with SVMs as the nodes. LOCtree, was extremely successful at learning evolutionary similarities among subcellular localization classes and was significantly more accurate than other traditional networks at predicting subcellular localization. Whenever available, LOCtree also reports predictions based on the following:

Nuclear localization signals found by PredictNLS,

Localization inferred using Prosite motifs and Pfam domains found in the protein, and

SWISS-PROT keywords associated with a protein.

Localization is inferred in the last two cases using the entropy-based LOCkey algorithm.

Comprehensive Prediction of Localization

LOCtree can predict the subcellular localization and DNA-binding propensity of non-membrane proteins in non-plant and plant eukaryotes as well as prokaryotes. LOCtree classifies eukaryotic animal proteins into one of five subcellular classes, while plant proteins are classified into one of six classes and prokaryotic proteins are classified into one of three classes . The novel feature of using a hierarchical architecture is the ability to make intermediate localization class predictions at much higher accuracy's. Another source of improvement is the use of 'noisy' training data. 'Noisy' predictions from LOCKey (SWISS-PROT keyword based annotations) and LOCHom (annotations using sequence homology) are used to train the hierarchical SVMs