Article Figures & Data

Figures

Comparison of two sets of 100 stationary distributions for which (the normalized difference between the relative frequency of 4FD codons after selection on amino acids, and their expected frequency resulting only from a mutation process) takes the highest (red) and the lowest values (green). The is the highest for the distributions with the high frequency of thymine and adenine, respectively, whereas the lowest for the distributions rich in cytosine and guanine, respectively.

Relationship between the value and combination of two nucleotides presented as colored Wafer maps. The colors correspond to the value of which depends on the frequency of the compared nucleotides. Dark green corresponds the lowest values, and dark brown the highest values of Its highest values are for the high content of thymine and adenine, with simultaneous decrease in the guanine and cytosine frequency. The lowest values are for the low frequency of A and T, as well as for moderate content of G and C.

Dependence of median value of i.e., on stationary frequencies of four nucleotides π. The median was calculated from values that were derived from substitution models generating nucleotide stationary distributions, with the given fixed frequency of one nucleotide πi and random frequencies of others. The dots represent exact values of whereas lines are the best approximation based on generalized additive models with integrated smoothness estimation. The depends nonlinearly on the stationary distribution of particular nucleotides. Its strongest increase is for the growth of A and T.

Dependence of median value of i.e., on stationary content of: adenine + thymine (A), guanine + cytosine (B), adenine and thymine (C), and guanine and cytosine (D) with equal frequencies, as well as purines (E) and pyrimidines (F). There is a clear nonlinear relationship with the minimum for equal proportions of purines and pyrimidines.

Dependence of the median value of i.e., for 4FD codon groups (assigned by their coded amino acids) on the stationary frequencies of four nucleotides π: adenine (A), thymine (B), guanine (C), and cytosine (D). The dots represent exact values of whereas lines are the best approximation based on generalized additive models with integrated smoothness estimation. The median value depends differently on the codon groups and nucleotides.

Dependence of the median value of i.e., for 4FD codon groups (assigned by their coded amino acids) on the stationary frequencies of purines (A) and pyrimidines (B). The dots represent exact values of whereas lines are the best approximation based on generalized additive models with integrated smoothness estimation. The groups of codons response differently to the frequencies.

Distribution of the deviation from the expectation in the codon usage for all 4FD groups calculated for protein coding sequences, starting (randomly selected) nucleotide substitution matrices, and matrices that maximized this measure. The maximized values are of the same order of magnitude as the deviation based on empirical data.

Dependence on the genomic A+T content of the difference in the relative usage of 4FD codons between genes coding for ribosomal and nonribosomal proteins. The difference was calculated based on 4802 genomes, with at least 30 genes annotated for ribosomal proteins, separately for the leading and lagging strand. In total, 5124 pairs of genes, with at least 15 ribosomal genes on one strand, were considered. The bars represent an average value for the given class of A+T content, whereas whiskers represent SD. The difference was calculated according to: where is the observed frequency of a 4FD codon si with a nucleotide i at the third codon position, and is the frequency of all codons in the 4FD codon group S. Indices rib and nonrib mean genes for ribosomal and nonribosomal proteins, respectively. The calculated difference decreases with AT%, and is the largest for the moderate AT content.

Additional Files

Supplemental Material for Błazej, et al, 2017

Files in this Data Supplement:

Figure S1 -
The procedure leading to the assessment of the selection strength, at the amino
acid level, on synonymous codons usage. (.pdf)

Figure S2 -
Comparison of nucleotide substitution probabilities for 5% of top matrices that
maximized the values of Fπ (the normalized difference between the relative frequency of four-fold degenerated
codons after the selection on amino acids and their expected frequency resulting only
from a mutation process). (.pdf)

The Genetics Society of America (GSA), founded in 1931, is the professional membership organization for scientific researchers and educators in the field of genetics. Our members work to advance knowledge in the basic mechanisms of inheritance, from the molecular to the population level.