Gliederung

Despite recent efforts to develop automated protein structure determination protocols, Structural Genomics projects have failed to generate fold assignments for complete proteomes. Experimentally determined structures remain unreachable for many prokaryotic and eukaryotic proteins. The alternative, cheap and fast methods to assign folds using prediction algorithms continue to provide valuable structural information for many targets. The development of high-quality prediction algorithms has been boosted in the last years by objective community-wide assessment experiments. One of the main lessons from these experiments is that experts that utilize diverse sources of information are more successful than groups, which relay on a single structure prediction method. Hints influencing the selection of final models may come for example from biological expertise or literature searches. Such procedures are difficult to implement in an automated and reproducible fashion. However, the large diversity can be obtained also by utilizing the growing number of diverse prediction algorithms. A framework to profit from this diversity was created by Meta Servers, which collect models from many prediction services spread around the globe.

The first successful attempt to benefit from the diversity of models was based on the simple approach of selecting the most abundant fold represented in the set of high scoring models, a procedure reminiscent of clustering simulated structures by ab initio prediction protocols. This procedure was easy to automate and resulted in the first fully automated meta-predictor, Pcons. Several others followed soon. All benchmarking results obtained in the last two years indicate that meta-predictors are more accurate than independent fold recognition methods. Their strength is mainly attributed to the structural clustering of initial models. Even if many of them are wrong, it can be expected that structures of incorrectly predicted fragments of the models have random conformations and only structures of fragments, which correspond to preferred conformations, occur with higher than expected frequency. The positive evaluation results boosted further development of meta-predictors. Currently available versions differ in the way the initial models are compared, in the way the final model is generated and in the use of the initial scores assigned to the models by individual servers.

3D-Jury is a fully automated protein structure meta prediction system accessible via the Meta Server interface (http://BioInfo.PL/Meta). The system is very simple and versatile and can be used to create meta predictions even from sets of models produced by humans. An additional, very important and novel feature of the system is the high correlation between the reported confidence score and the accuracy of the model. The number of correctly predicted residues can be estimated directly from the prediction score. The high reliability of the method enables any biologist to submit a target of interest to the Meta Server and screen with relatively high confidence, whether the target can be predicted by fold recognition methods while being unpredictable using standard approaches like PSI-Blast. This can point to interesting relationship, which could have been missed in annotations of proteins or genomes and provide very valuable information for novel scientific discoveries.

We have applied the 3D jury meta predictor to annotate the structure of SARS proteins. One of the most interesting findings obtained during the annotation process is a surprisingly reliable (3D jury score >100) assignment of the methyltransferase fold to the nsp13 domain. Standard sequence comparison tools such as PSI-BLAST or RPS-BLAST applied using the conserved domain database failed to reliably assign a function to this domain. The domain belongs to the ancient family of AdoMet-dependent ribose 2'-O-methyltransferases, which has been adapted by numerous viruses before the three domains of life evolved form the last universal common ancestor. The enzymatic role of the protein was confirmed in silico by the presence of the conserved tetrad of residues K-D-K-E essential for mRNA cap-1 (mGpppNm) formation.