BACKGROUND: The major birch pollen allergen, Bet v 1, is a member of theubiquitous PR-10 family of plant pathogenesis-related proteins. In recentyears, a number of diverse plant proteins with low sequence similarity toBet v 1 was identified. In addition, determination of the Bet v 1structure revealed the existence of a large superfamily of structurallyrelated proteins. In this study, we aimed to identify and classify all Betv 1-related structures from the Protein Data Bank and all Bet v 1-relatedsequences from the Uniprot database. RESULTS: Structural comparisons ofrepresentative members of already known protein families structurallyrelated to Bet v 1 with all entries of the Protein Data Bank yielded 47structures with non-identical sequences. They were classified into elevenfamilies, five of which were newly identified and not included in theStructural Classification of Proteins database release 1.71. The taxonomicdistribution of these families extracted from the Pfam protein familydatabase showed that members of the polyketide cyclase family and theactivator of Hsp90 ATPase homologue 1 family were distributed among allthree superkingdoms, while members of some bacterial families wereconfined to a small number of species. Comparison of ligand bindingactivities of Bet v 1-like superfamily members revealed that theirfunctions were related to binding and metabolism of large, hydrophobiccompounds such as lipids, hormones, and antibiotics. Phylogeneticrelationships within the Bet v 1 family, defined as the group of proteinswith significant sequence similarity to Bet v 1, were determined byaligning 264 Bet v 1-related sequences. A distance-based phylogenetic treeyielded a classification into 11 subfamilies, nine exclusively containingplant sequences and two subfamilies of bacterial proteins. Plant sequencesincluded the pathogenesis-related proteins 10, the major latexproteins/ripening-related proteins subfamily, and polyketide cyclase-likesequences. CONCLUSION: The ubiquitous distribution of Bet v 1-relatedproteins among all superkingdoms suggests that a Bet v 1-like protein wasalready present in the last universal common ancestor. During evolution,this protein diversified into numerous families with low sequencesimilarity but with a common fold that succeeded as a versatile scaffoldfor binding of bulky ligands.

The superfamily of leucine-rich repeat proteins can be subdivided into at least six subfamilies, characterised by different lengths and consensus sequences of the repeats. It was proposed that the repeats from different subfamilies retain a similar superhelical fold, but differ in the three-dimensional structures of individual repeats. The sequence-structure relationship of three new subfamilies was examined by molecular modelling. I provide structural models for the repeats of all subfamilies. The models enable me to explain residue conservations within each subfamily. Furthermore, the difference in the packing explains why the repeats from different subfamilies never occur simultaneously in the same protein. Finally, these studies suggest different evolutionary origins for the different subfamilies. The approach used for the prediction of the leucine-rich repeat protein structures can be applied to other proteins containing internal repeats of about 20 to 30 residue in length.

This information is based on mapping of SMART genomic protein database to KEGG orthologous groups. Percentage points are related to the number of proteins with LRR domain which could be assigned to a KEGG orthologous group, and not all proteins containing LRR domain. Please note that proteins can be included in multiple pathways, ie. the numbers above will not always add up to 100%.