The Ancestral Genetic Code Cube

Based on the available literature, we considered the plausible existence of five or more bases in the earliest DNA molecules. Our algebraic and biological model suggests the plausibility of the transition from a primeval genetic code with an extended DNA alphabet to the present standard genetic code, where the symbol represents one or more hypothetical bases with unspecific pairings. The results suggest that the Watson–Crick base pairing ( and ) and the non-specific base pairing of the hypothetical ancestral base used to define the sum and product operations are enough features to determine the coding constraints of the primeval and the modern genetic code, as well as the transition from the former to the latter. So then, it is algebraically proved that the present genetic code architecture (very well described in the middle of the last century by Crick and other authors) could be derived from the plausible ancient architecture as quotients of three-dimensional vector spaces and as quotient groups. The coordinate representation of ancient and present codons allows us to insert the set into the three-dimensional real vector space that can be represented as an ordinary cube (or regular hexahedron) with three of its faces contained in the coordinate planes , , . This Demonstration lets you manipulate the algebraic operations over the DNA base triplets. You can visualize the principal algebraic features of the ancient (plausible) and standard genetic codes [1]. In particular, the algebraic operations over the codon subsets formed by vertical planes, lines, and amino acids (or individual codons) can be visualized. The cube here can be seen as containing biological and algebraic criteria.

THINGS TO TRY

SNAPSHOTS

DETAILS

Biological Background

In the coding protein of the DNA molecules genetic information is encoded in three bases called triplets or codons. Every codon encodes the information for one amino acid and every amino acid can be encoded by one or more codons. The genetic code is the biochemical system that establishes the rules by which the nucleotide sequence of a gene is transcribed into the mRNA codon sequence and next the mRNA is translated into the amino acid sequence of the corresponding protein. The genetic code is an extension of the four-letter alphabet found in DNA molecules. These "letters" are the DNA bases: adenine, guanine, cytosine, and thymine, usually denoted , , , and respectively (in RNA, is changed to , uracil). They are paired according to the following rule (Watson–Crick base pairings): , . The base is the complementary base of , and is the complementary base of (or ) in the DNA (or RNA) molecule and vice-versa. The standard genetic code table is:

In 1968, Nobel Prize winner Francis Crick pointed out the principal partitions of the genetic code [2], showing that the 20 amino acids are not distributed at random among the 64 codons. He summarized the following rules: • XYU and XYC always code the same amino acid.• XYA and XYG often code the same amino acid. The rare amino acids, methionine and tryptophan, which have only one codon each, appear to be exceptions to this rule.• In 8 out of 16 cases the first two bases of codons are constant: XYA, XYC, XYG, and XYU.• In most cases the codons representing a single amino acid start with the same pair of bases. Thus the two codons for histidine both start with CA. There are three exceptions to this: leucine has CUA, CUC, CUG, CUU, UUA, and UUG; serine has UCA, UCC, UCG, UCU, AGC, and AGU; arginine has CGA, CGC, CGG, CGU, AGA, and AGG.• If the first two bases consist only of Gs and Cs, then the four codons sharing the same initial doublet all code the same amino acid. That is, the meaning of these codons is independent of the third base. This is in fact true for all codons having C in the second position.

Algebraic Foundations

If the Watson–Crick base pairings are symbolically expressed by means of the sum "+" operation in such a way that the following relationships hold: , , (i.e., the complementary RNA (or DNA) bases and (or ) are, respectively, algebraic complements), then this requirement leads us to define an additive group ( on the set of five RNA (DNA) bases (see the "sum and times operation tables" in Snapshot 1). Explicitly, it was required that the bases with the same number of hydrogen bonds in the DNA molecule and different chemical types be algebraic inverses in the additive group defined in the set of DNA bases . This definition also reflects the non-specific pairings of the ancient hypothetical base(s) , which is taken as the neutral element of the sum operation. Next, there is only one possible definition for the product operation ("·") (with as the neutral element for this operation) in such a way that it completes a finite (Galois) field structure isomorphic to the field and defined over the set of integers modulo 5, (see "sum and times operation tables" in Snapshot 2). The sum and times operations over the sets and define two isomorphic finite fields that imply the bijections: , , , , .

The plausible ancient genetic code comprises the standard genetic code. In Snapshot 1, all codons are pointed out because, in "sum and times operation tables", the "Ancestral Triplet Subset" was selected in member 1 and the "Standard Genetic Code" in member 2. The triplets that belong to the standard genetic code (also called codons) are yellow since in "mcolor 1", it was chosen to have a yellow color. Black triplets (extended triplets) correspond to the extinct plausible codons. According to Crick's rules, amino acids with similar physicochemical properties are encoded by extended codons belonging to the same vertical plane in the cube (the same affine subspace) and extended codons encoding for the same or very similar amino acids are located in the same vertical line. For instance, codons that encode hydrophilic amino acids are found in the blue vertical plane (see Snapshot 1). The color of the five principal vertical planes , , , , and can be changed in "vertical planes handling". Notice that biologically speaking both the ancient and the present genetic codes are not cubes but they can be represented, according to their biological and algebraic features, as three-dimensional cubes.

Algebraic Cube Handling

The principal partitions of the standard genetic code—pointed out by Crick—can be algebraically derived by manipulating the sum and times operations set in the cube. The cube presented in this Demonstration can be seen as a building or a hotel where the rooms (placed in the cube's nodes) are filled according to biological and algebraic criteria. Thus, as Crick pointed out (see above), the base triplets are located in the cube not at random but holding algebraic relationships. Next, the empty hotel rooms will be filled out starting from some occupied rooms and using the sum and times operations. In the vector space of the extended triplets, the vertical plane forms a two-dimensional vector subspace, which is contained in the plane(see Snapshot 2) and the rest of the vertical planes are affine subspaces. This is the reason why, for example, the plane can be obtained from the plane by adding to the latter any codon of plane . In Snapshot 2, for instance, the sum is derived by selecting "Vertical Plane XDZ" in member 1, selecting in member 2, and clicking the sum checkbox of "sum and times operations". Codons belonging to the plane are yellow and codons from the plane are white. Likewise it is possible to obtain the vertical plane from the plane by adding to the latter any of the codons that belong to plane , for instance, codon . In Snapshot 3, the sum is derived by selecting "Vertical Plane XDZ" in member 1 and CGA in member 2 of "sum and times operations". In particular, in Snapshot 4, we can see codon .

The cube edge inserted in the coordinate axis (the vertical line ) is a one-dimensional vector subspace (see Snapshot 5). In the cube there are another 24 vertical lines of four segments each, which are affine subspaces (also called cosets), every one of which can be obtained from the vectorial line . For instance, the vertical line can be derived from the line by adding to the latter any of the codons with . In Snapshot 5, we can see the sum . In most of the cases, codons encoding the same or very similar amino acids (with similar physico-chemical properties) belong to the same vertical line [1]. This fact can be easily observed using, for instance, "sum and times operations" and selecting any of the amino acids that appear in the popup menu of member 1 and "None" in member 2.

The product operation between two triplets and is defined by , according to the rules: , for all , in , , , , , , and .

The subset of all codons (the subset of codons that form the standard genetic code) is closed under products; that is, the standard genetic code with this product operation is a multiplicative group. In other words, this means that the product of any two codons is always a codon and never produces an extended triplet. For this reason, in Snapshot 6, the products of the codon that encodes amino acid Trp (UGG, codon yellow) and the codons that encode amino acid Ala (green codons) are, according to the standard genetic code table, the codons that encode amino acids Glu (with codons and ) and Asp (with codons and ) (white codons).

The subset of codon (inserted in the vertical plane ) is a subgroup of the multiplicative group defined over the standard genetic code. This means that any of the codon subsets: , , and (with ) inserted, respectively, in the vertical planes , , and , can be obtained from the subset . In simpler terms, for instance, in Snapshot 7, the subset of codons (white) can be obtained by selecting the "Code Plane XAZ" (white, this is a plane of the standard genetic code) in the popup menu of "member 1" and the amino acid Gly (glycine) in "member 2" of "sum and times operations" and by clicking the checkbox "times". Note that the same result can be obtained just by using one of the codons that encode amino acid Gly or any of the codons from the subset .

In Snapshot 8, the "Vertical Plane XDZ" (yellow triplets) and the "GC Random Sample" (white codons) have been selected in the popup menus of "member 1" and "member 2", respectively. Every time that "GC Random Sample" is selected (it must be previously deselected) or the corresponding triplet color is changed, a new sample of four codons is picked out. Next, in Snapshot 9, the whole ancient genetic code is algebraically obtained, keeping the selected triplet subsets of Snapshot 8 and just clicking the sum checkbox of "sum and times operations". Notice that it does not matter which codons are picked out with "GC Random Sample", the result will be the same. Likewise, in Snapshot 10 the presented standard genetic code is algebraically derived by choosing "Code Plane XAZ" in the popup menu of "member 1", "GC Random Sample" in "member 2", and clicking the times checkbox of "sum and times operations".

By means of this Demonstration the biological and abstract algebraic features of the genetic code described in [1] are visualized. In general, users can operate over different subsets of codons and derive new subsets by clicking the sum or times checkboxes of "sum and times operations". Mathematically speaking, this means that if the genetic code evolves so as to minimize the transcription and replication errors, then both the ancient and the present standard genetic codes are mathematically determined.