Abstract

While mRNA stability has been demonstrated to control rates of translation, generating both global and local synonymous codon biases in many unicellular organisms, this explanation cannot adequately explain why codon bias strongly tracks neighboring intergene GC content; suggesting that structural dynamics of DNA might also influence codon choice. Because minor groove width is highly governed by 3-base periodicity in GC, the existence of triplet-based codons might imply a functional role for the optimization of local DNA molecular dynamics via GC content at synonymous sites (≈GC3). We confirm a strong association between GC3-related intrinsic DNA flexibility and codon bias across 24 different prokaryotic multiple whole-genome alignments. We develop a novel test of natural selection targeting synonymous sites and demonstrate that GC3-related DNA backbone dynamics have been subject to moderate selective pressure, perhaps contributing to our observation that many genes possess extreme DNA backbone dynamics for their given protein space. This dual function of codons may impose universal functional constraints affecting the evolution of synonymous and non-synonymous sites. We propose that synonymous sites may have evolved as an 'accessory' during an early expansion of a primordial genetic code, allowing for multiplexed protein coding and structural dynamic information within the same molecular context.

Internal phosphate linkages of codons assigned by the standard genetic code are color-coded according to their fixed levels of intrinsic DNA flexibility. The intrinsic flexibility of external phosphate linkages, located at the center and outer edge of the white circle, are variable across genes and genomes and are determined through adjacent codon usage patterns. Adapted from () and ().

The positive association (r) of entropy-based codon bias (1-Ew) and gene-level deviations from synonymous flexibility for the given protein space for (A) the 27 dapL genes and (B) all 35 234 prokaryotic genes in the ATGC database (24 genomes). Examples of the deviation from average synonymous flexibility given its protein space for single genes are shown in Figure (i.e. the mean devTRXcdn = abs[mean red line-mean black line]/[mean blue line - mean green line]). Also shown are the positive association of (C)overall gene GC content and (D) overall gene third position GC content (GC3) with gene-level deviations from synonymous flexibility for the given protein space.