Genetic Code

By the 1960s it had long been apparent that at least three nucleotide residues of DNA are necessary to en-code each amino acid. The four code letters of DNA (A, T, G, and C) in groups of two can yield only 4 2 =16 different combinations, insufficient to encode 20 amino acids. Groups of three, however, yield 4 3 = 64 different combinations.

A codon is a triplet of nucleotides that codes for a specific amino acid. Translation occurs in such a way that these nucleotide triplets are read in a successive, non-overlapping fashion. A specific first codon in the sequence establishes the reading frame, in which a new codon begins every three nucleotide residues. There is no punctuation between codons for successive amino acid residues. The amino acid sequence of a pro-tein is defined by a linear sequence of contiguous triplets. In principle, any given single-stranded DNA or mRNA sequence has three possible reading frames. Each reading frame gives a different sequence of codons (Fig. 27–5), but only one is likely to encode a given pro-tein.

In 1961 Marshall Nirenberg and Heinrich Matthaei re-ported the first breakthrough. They incubated synthetic polyuridylate, poly (U), with an E. coli extract, GTP, ATP, and a mixture of the 20 amino acids in 20 different tubes, each tube containing a different radioactively labeled amino acid. Because poly (U) mRNA is made up of many successive UUU triplets, it should promote the synthesis of a polypeptide containing only the amino acid encoded by the triplet UUU. A radioactive polypeptide was indeed formed in only one of the 20 tubes, the one containing radioactive phenylalanine. Niren-berg and Matthaei therefore concluded that the triplet codon UUU encodes phenyl-alanine. The same approach re-vealed that polycytidylate, poly(C), encodes a polypep-tide containing only proline (polyproline), and polyadeny-late, poly (A), encodes polylysine. Polyguanylate did not generate any polypeptide in this experiment because it spontaneously forms tetraplexes that can-not be bound by ribosomes

The synthetic polynucleotides used in such experiments were prepared with polynucleotide phosphory-lase, which catalyzes the formation of RNA polymers starting from ADP, UDP, CDP, and GDP. This enzyme requires no template and makes polymers with a base composition that directly reflects the relative concentrations of the nucleoside 5′-diphosphate pre-cursors in the medium. If polynucleotide phosphorylase is presented with UDP only, it makes only poly(U). If it is presented with a mixture of five parts ADP and one part CDP, it makes a polymer in which about five-sixths of the residues are adenylate and one-sixth are cytidy-late. This random polymer is likely to have many triplets of the sequence AAA, smaller numbers of AAC, ACA, and CAA triplets, relatively few ACC, CCA, and CAC triplets, and very few CCC triplets (Table 27–1). Using a variety of artificial mRNAs made by polynucleotide phosphorylase from different starting mixtures of ADP, GDP, UDP, and CDP, investigators soon identified the base compositions of the triplets coding for almost all the amino acids. Although these experiments revealed the base composition of the coding triplets, they could not reveal the sequence of the bases

In 1964 Nirenberg and Philip Leder achieved an-other experimental breakthrough. Isolated E. coli ribo-somes would bind a specific aminoacyl-tRNA in the presence of the corresponding synthetic polynucleotide messenger. For example, ribosomes incu-bated with poly(U) and phenylalanyl-tRNA Phe (Phe-tRNA Phe ) bind both RNAs, but if the ribosomes are incubated with poly(U) and some other aminoacyl-tRNA, the aminoacyl-tRNA is not bound, because it does not recognize the UUU triplets in poly(U). Even trinucleotides could promote specific binding of appropriate tRNAs, so these experiments could be car-ried out with chemically synthesized small oligonu-cleotides. With this technique researchers determined which aminoacyl-tRNA bound to about 50 of the 64 possible triplet codons. For some codons, either no amino-acyl-tRNA or more than one would bind. Another method was needed to complete and confirm the entire genetic code

H. Gobind Khorana, developed chemical methods to synthesize poly-ribonucleotides with defined, repeating sequences of two to four bases. The polypeptides produced by these mRNAs had one or a few amino acids in repeating patterns. These patterns, when combined with information from the random polymers used by Nirenberg and colleagues, permitted unambiguous codon assignments. The copolymer (AC)n, for example, has alternating ACA and CAC codons: ACACACACACACACA. The polypeptide syn-thesized on this messenger contained equal amounts of threonine and histidine. Given that a histidine codon has one A and two Cs (Table 27–1), CAC must code for his-tidine and ACA for threonine.

Consolidation of the results from many experiments permitted the assignment of 61 of the 64 possible codons. The other three were identified as termination codons, in part because they disrupted amino acid coding patterns when they occurred in a synthetic RNA polymer (Fig. 27–6). Meanings for all the triplet codons (tabulated in Fig. 27–7) were established by 1966 and have been verified in many different ways. The cracking of the genetic code is regarded as one of the most important scientific discoveries of the twentieth century.

Several codons serve special functions. The initiation codon AUG is the most common signal for the beginning of a polypeptide in all cells, in addition to coding for Met residues in internal positions of polypep-tides. The termination codons (UAA, UAG, and UGA), also called stop codons or nonsense codons, normally signal the end of polypeptide synthesis and do not code for any known amino acids In a random sequence of nucleotides, 1 in every 20 codons in each reading frame is, on average, a termination codon. In general, a reading frame without a ter-mination codon among 50 or more codons is referred to as an open reading frame (ORF). Long open reading frames usually correspond to genes that encode pro-teins. In the analysis of sequence databases, sophisticated programs are used to search for open reading frames in order to find genes among the often huge background of nongenic DNA. An uninterrupted gene coding for a typical protein with a molecular weight of 60,000 would require an open reading frame with 500 or more codons.

A striking feature of the genetic code is that an amino acid may be specified by more than one codon, so the code is described as degenerate. The degeneracy of the code is not uni-form. Whereas methionine and tryptophan have single codons, for example, three amino acids (Leu, Ser, Arg) have six codons, five amino acids have four, isoleucine has three, and nine amino acids have two (Table 27–3).

The genetic code is universal i.e. prokaryotic and eukaryotic organisms use same codon to specify each amino acid. Rare exception is codon use in yeast mitochondria and Mycoplasma. Thus all life forms have common evolutionary ancestor,whose genetic code has been preserved through biological evolution.

Wobble Allows Some tRNAs to Recognize More than One Codon

When several different codons specify one amino acid, the difference between them usually lies at the third base position (at the 3′ end). For example, alanine is coded by the triplets GCU, GCC, GCA, and GCG. The codons for most amino acids can be symbolized by XYAG or XYUC. The first two letters of each codon are the primary determinants of specificity, a feature that has some interesting consequences. Transfer RNAs base-pair with mRNA codons at a three-base sequence on the tRNA called the anticodon. The first base of the codon in mRNA (read in the 5′—3′ direction) pairs with the third base of the anticodon. If the anticodon triplet of a tRNA recognized only one codon triplet through Watson-Crick base pairing at all three positions, cells would have a different tRNA for each amino acid codon. This is not the case, however, because the anticodons in some tRNAs include the nucleotide inosinate (designated I), which contains the uncommon base hypoxanthine. Inosinate can form hydrogen bonds with three different nucleotides (U, C, and A) althoughthese pairings are much weaker than the hydrogen bonds of Watson-Crick base pairs (GmC and AUU). In yeast, one tRNAArg has the anticodon (5′) ICG, which recognizes three arginine codons: (5′)CGA, (5′)CGU,and (5′)CGC. The first two bases are identical (CG) and form strong Watson-Crick base pairs with the corresponding bases of the anticodon, but the third base (A, U, or C) forms rather weak hydrogen bonds with the I residue at the first position of the anticodon. Examination of these and other codon-anticodon pairings led Crick to conclude that the third base of most codons pairs rather loosely with the corresponding base of its anticodon; to use his picturesque word, the third base of such codons (and the first base of their corre-sponding anticodons) “wobbles.” Crick proposed a set of four relationships called the wobble hypothesis:

1. The first two bases of an mRNA codon always form strong Watson-Crick base pairs with the corresponding bases of the tRNA anticodon and confer most of the coding specificity.

2.The first base of the anticodon (reading in the 5’__3’direction; this pairs with the third base of the codon) determines the number of codons recognized by the tRNA. When the first base of the anticodonis C or A, base pairing is specific and only one codon is recognized by that tRNA.

3.When the first base is U or G, binding is less specific and two different codons may be read. When inosine (I) is the first (wobble) nucleotide of an anticodon, three different codons can be recognized—the maximum number for any tRNA. When an amino acid is specified by several different codons, the codons that differ in either of the first two bases require different tRNAs.

4.A minimum of 32 tRNAs are required to translate all 61 codons (31 to encode the amino acids and 1 for initiation). The wobble (or third) base of the codon con-tributes to specificity, but, because it pairs only loosely with its corresponding base in the anticodon, it per-mits rapid dissociation of the tRNA from its codon dur-ing protein synthesis. If all three bases of a codon engaged in strong Watson-Crick pairing with the three bases of the anticodon, tRNAs would dissociate too slowly and this would severely limit the rate of protein synthesis. Codon-anticodon interactions balance the requirements for accuracy and speed.The genetic code tells us how protein sequence in-formation is stored in nucleic acids and provides some clues about how that information is translated into protein.

Breif generalizations about Genetic code:-

The genetic code is universal i.e. prokaryotic and eukaryotic organisms use same codon to specify each amino acid. Rare exception is codon use in yeast mitochondria and Mycoplasma.Thus all life forms have common evolutionary ancestor,whose genetic code has been preserved through biological evolution.

The code is degenerate i.e more than one arrangement of nucleotide triplet specify same aminoacid. Thus UUA, UUG, CUU, CUC, CUA and CUG all code for leucine.

For a given amino acid, first two bases are limited to one or two combination, but third base can have as many as more. This suggests that a change in third base by mutation may still allow the correct translation of a given aminoacid into protein.

The degeneracy of codon is not uniform. Whereas methionine and tryptophan has single codon, Leu, Ser and Arg have six codons, isoleucine has three and nine amino acids have two, five amino acids have four codons.

The code is non-overlapping i.e adjacent codons do not overlap.

The code is commaless i.e there are no specific signals or commas between codons.

Of 64 codons, 61 are employed for encoding aminoacids. Three UAA, UAG and UGA are called termination codons.

AUG serve as initiator codon as well as internal methionine codons.

In general amino acid with hydrocarbon side chain have U or C as second base, those with branched methyl groups have U as the second base. Basic and acidic amino acids have A or G as the second base.

Polarity: The genetic code has polarity, that is, the code is always read in a fixed direction, i.e., in the 5′ →3′direction. It is apparent that if the code is read in opposite direction (i.e., 3′ → 5′), it would specify 2 different proteins, since the codon would have reversed base sequence :

Codon : UUG AUC GUC UCG CCA ACA AGG

Polypeptide :→Leu Ile Val Ser Pro Thr Arg

Val Leu Leu Ala Thr Thr Gly ←

Non-ambiguity While the same amino acid can be coded by more than one codon (the code is degenerate), the same codon shall not code for two or more different amino acids (non-ambiguous).