Natural selection of protein structural and functional properties: a single nucleotide polymorphism perspective.

Liu J, Zhang Y, Lei X, Zhang Z - Genome Biol. (2008)

Bottom Line:
On the residue level, we found higher degrees of variation for residues that are exposed to solvent, are in a loop conformation, natively disordered regions or low complexity regions, or are in the signal peptides of secreted proteins.Our study suggests that the SNP A/S ratio is a robust measure for selective constraints.The correlations between SNP A/S ratios and other variables provide valuable insights into the natural selection of various structural or functional properties, particularly for human-specific genes and constraints within the human lineage.

Background: The rates of molecular evolution for protein-coding genes depend on the stringency of functional or structural constraints. The Ka/Ks ratio has been commonly used as an indicator of selective constraints and is typically calculated from interspecies alignments. Recent accumulation of single nucleotide polymorphism (SNP) data has enabled the derivation of Ka/Ks ratios for polymorphism (SNP A/S ratios).

Results: Using data from the dbSNP database, we conducted the first large-scale survey of SNP A/S ratios for different structural and functional properties. We confirmed that the SNP A/S ratio is largely correlated with Ka/Ks for divergence. We observed stronger selective constraints for proteins that have high mRNA expression levels or broad expression patterns, have no paralogs, arose earlier in evolution, have natively disordered regions, are located in cytoplasm and nucleus, or are related to human diseases. On the residue level, we found higher degrees of variation for residues that are exposed to solvent, are in a loop conformation, natively disordered regions or low complexity regions, or are in the signal peptides of secreted proteins. Our analysis also revealed that histones and protein kinases are among the protein families that are under the strongest selective constraints, whereas olfactory and taste receptors are among the most variable groups.

Conclusion: Our study suggests that the SNP A/S ratio is a robust measure for selective constraints. The correlations between SNP A/S ratios and other variables provide valuable insights into the natural selection of various structural or functional properties, particularly for human-specific genes and constraints within the human lineage.

Figure 1: The SNP A/S ratio is a good measure for evolutionary constraints. Error bars represent 95th percentile confidence intervals from bootstrap resampling. (a) SNP A/S ratios correlate with Ka/Ks ratios from human-mouse alignments. Proteins were grouped into bins of equal intervals (interval = 0.05) according to their Ka/Ks ratios, and the SNP A/S ratio was calculated for each bin. (b) SNP A/S ratios correlate negatively with residue conservation scores from protein sequence alignments. All residues were grouped into bins of equal intervals (interval = 0.5) according to their position specific alignment information taken from PSI-BLAST alignment profiles, and the SNP A/S ratio was obtained for each bin.

Mentions:
To assess whether SNP A/S ratios from the current large-scale SNP data set provide a good measure for selective constraints, we first compared them with Ka/Ks ratios derived from inter-species alignments. We collected 9,759 human proteins with both validated coding-region SNPs and available human-mouse Ka/Ks data from Ensemble [13], binned them by their Ka/Ks values, and measured the SNP A/S ratios for each group. There is a strong positive correlation between these two measure (Figure 1a; Kendall's rank correlation [14] τ = 0.50, p-value < 1e-04), which is in agreement with the neutral theory of molecular evolution. Analysis of data from chimpanzee and Old World monkey (Macaca mulatta) led to similar conclusions, although the Ka/Ks values may need to be corrected to subtract the contribution of SNPs due to relatively short evolutionary distance.

Figure 1: The SNP A/S ratio is a good measure for evolutionary constraints. Error bars represent 95th percentile confidence intervals from bootstrap resampling. (a) SNP A/S ratios correlate with Ka/Ks ratios from human-mouse alignments. Proteins were grouped into bins of equal intervals (interval = 0.05) according to their Ka/Ks ratios, and the SNP A/S ratio was calculated for each bin. (b) SNP A/S ratios correlate negatively with residue conservation scores from protein sequence alignments. All residues were grouped into bins of equal intervals (interval = 0.5) according to their position specific alignment information taken from PSI-BLAST alignment profiles, and the SNP A/S ratio was obtained for each bin.

Mentions:
To assess whether SNP A/S ratios from the current large-scale SNP data set provide a good measure for selective constraints, we first compared them with Ka/Ks ratios derived from inter-species alignments. We collected 9,759 human proteins with both validated coding-region SNPs and available human-mouse Ka/Ks data from Ensemble [13], binned them by their Ka/Ks values, and measured the SNP A/S ratios for each group. There is a strong positive correlation between these two measure (Figure 1a; Kendall's rank correlation [14] τ = 0.50, p-value < 1e-04), which is in agreement with the neutral theory of molecular evolution. Analysis of data from chimpanzee and Old World monkey (Macaca mulatta) led to similar conclusions, although the Ka/Ks values may need to be corrected to subtract the contribution of SNPs due to relatively short evolutionary distance.

Bottom Line:
On the residue level, we found higher degrees of variation for residues that are exposed to solvent, are in a loop conformation, natively disordered regions or low complexity regions, or are in the signal peptides of secreted proteins.Our study suggests that the SNP A/S ratio is a robust measure for selective constraints.The correlations between SNP A/S ratios and other variables provide valuable insights into the natural selection of various structural or functional properties, particularly for human-specific genes and constraints within the human lineage.

Background: The rates of molecular evolution for protein-coding genes depend on the stringency of functional or structural constraints. The Ka/Ks ratio has been commonly used as an indicator of selective constraints and is typically calculated from interspecies alignments. Recent accumulation of single nucleotide polymorphism (SNP) data has enabled the derivation of Ka/Ks ratios for polymorphism (SNP A/S ratios).

Results: Using data from the dbSNP database, we conducted the first large-scale survey of SNP A/S ratios for different structural and functional properties. We confirmed that the SNP A/S ratio is largely correlated with Ka/Ks for divergence. We observed stronger selective constraints for proteins that have high mRNA expression levels or broad expression patterns, have no paralogs, arose earlier in evolution, have natively disordered regions, are located in cytoplasm and nucleus, or are related to human diseases. On the residue level, we found higher degrees of variation for residues that are exposed to solvent, are in a loop conformation, natively disordered regions or low complexity regions, or are in the signal peptides of secreted proteins. Our analysis also revealed that histones and protein kinases are among the protein families that are under the strongest selective constraints, whereas olfactory and taste receptors are among the most variable groups.

Conclusion: Our study suggests that the SNP A/S ratio is a robust measure for selective constraints. The correlations between SNP A/S ratios and other variables provide valuable insights into the natural selection of various structural or functional properties, particularly for human-specific genes and constraints within the human lineage.