The primary goal of this project will be to revitalize the Structural Classification of Proteins (SCOP) and ASTRAL databases, both in order to better serve the needs of both current users and the larger scientific community. Both databases provide carefully curated resources that are widely used by biologists to explore remote homologs of proteins of interest, and by computational biologists as a """"""""gold standard"""""""" for benchmarking prediction algorithms. However, neither database has changed its basic design since early releases-15 years ago in the case of SCOP. We will redesign the internals of both databases in order to account for aspects of protein evolution that were not appreciated at the time the databases were first created, such as metamorphic proteins and homologous proteins that have evolved different folds. We will also develop a unified interface to both databases that will allow scientists to easily find and focus on proteins or families of interest, as the current hierarchical view is increasingly unwieldy as more structures are added. Since the process of SCOP curation has become a bottleneck due to the large number of structures being solved today, we plan to build automated tools to assist in the classification. We will also create interfaces to allow biologists to submit sequences or structures for automated classification using the latter tools. In many cases, this would enable structural biologists to gain insight into a protein's evolution or function prior to publication of a newly solved structure.

Public Health Relevance

PROJECT NARRATIVE The SCOP and ASTRAL databases are carefully curated resources that are widely used by biologists to explore the structure, function, and evolution of protein families of interest, and by computational biologists as a """"""""gold standard"""""""" for benchmarking prediction algorithms. Protein structures are essential to many modern studies of pathogens and human diseases, and medical conditions are being rapidly linked to specific mutations. Although the flood of structural information threatens to overwhelm our current capacity for analysis, our proposed changes to the curation procedures for SCOP and ASTRAL, and improvements to the underlying structure of the databases, will allow these resources to continue to yield biological and medical insight for many years to come.