Abstract/Summary

Background
Efficient gene expression involves a trade-off between (i) premature termination of protein synthesis; and (ii) readthrough, where the ribosome fails to dissociate at the terminal stop. Sense codons that are similar in sequence to stop codons are more susceptible to nonsense mutation, and are also likely to be more susceptible to transcriptional or translational errors causing premature termination. We therefore expect this trade-off to be influenced by the number of stop codons in the genetic code. Although genetic codes are highly constrained, stop codon number appears to be their most volatile feature.
Results
In the human genome, codons readily mutable to stops are underrepresented in coding sequences. We construct a simple mathematical model based on the relative likelihoods of premature termination and readthrough. When readthrough occurs, the resultant protein has a tail of amino acid residues incorrectly added to the C-terminus. Our results depend strongly on the number of stop codons in the genetic code. When the code has more stop codons, premature termination is relatively more likely, particularly for longer genes. When the code has fewer stop codons, the length of the tail added by readthrough will, on average, be longer, and thus more deleterious. Comparative analysis of taxa with a range of stop codon numbers suggests that genomes whose code includes more stop codons have shorter coding sequences.
Conclusions
We suggest that the differing trade-offs presented by alternative genetic codes may result in differences in genome structure. More speculatively, multiple stop codons may mitigate readthrough, counteracting the disadvantage of a higher rate of nonsense mutation. This could help explain the puzzling overrepresentation of stop codons in the canonical genetic code and most variants.