The canonical genetic code includes 64 codons encoding 20 amino acids and three stop signals. It is preserved in three kingdoms of life. The origin of the genetic code, whether a “frozen accident” or an expansion from a primordial code with fewer amino acids, remains an enigma (1, 2). Although proteins carry out most of the complex processes of life, there is clearly a need for additional building blocks: Some functions are dependent on posttranslational modification or cofactors, and many important peptides containing unusual amino acids are synthesized nonribosomally (3). Why only 20, and why were these 20 amino acids in particular chosen for the code? Nature encodes two additional amino acids, selenocysteine (Sec) and pyrrolysine (Pyl), in limited proteins but in a distinctive way: The Sec-tRNASec is converted from a preloaded Ser-tRNASec. Special mRNA elements and elongation factors are also required for Sec incorporation into proteins (4). Pyl is likely incorporated similarly (5). Why does nature not simply employ the standard mechanism of directly loading amino acids onto their cognate tRNA to incorporate Sec and Pyl?

In my graduate research, I explored whether the genetic code can be expanded to accommodate additional amino acids, using a strategy that mimics the way that the common amino acids are encoded. To do this, a novel tRNA-codon pair and an aminoacyl-tRNA synthetase (aaRS) need to be generated that uniquely incorporate an unnatural amino acid. The new components should be orthogonal to the endogenous ones to avoid crosstalk and should function efficiently with the translational apparatus (see the figure) (6).

A general method for genetically encoding unnatural amino acids into proteins.

CREDIT: PRESTON HUEY/SCIENCE

Design of a new codon-tRNA-aaRS set from scratch would be nearly impossible, considering their delicate interactions evolved to ensure translational accuracy. My approach was therefore borrowing and engineering. Escherichia coli was chosen as the host organism, and the amber nonsense codon (UAG) was hijacked to encode an unnatural amino acid. I first generated an orthogonal amber suppressor tRNATyrCUA/TyrRS pair in E. coli by importing a tRNATyr/TyrRS pair from the archaebacterium Methanococcus jannaschii (Mj), after testing various tRNA/aaRS pairs from different organisms (7). To optimize this pair, I thendeveloped a general strategy consisting of negative and positive selections of a mutant suppressor tRNA library (8). Eleven nucleotides of Mj-tRNATyrCUA were randomly mutated, and from the resulting library a mutant (mutRNATyrCUA) was identified that has almost no affinity for E. coli synthetases and is still charged efficiently by the orthogonal Mj-TyrRS with tyrosine. The next step was to alter the amino acid specificity of the Mj-TyrRS so that it aminoacylated the mutRNATyrCUA with an unnatural amino acid only. A combinatorial approach was pursued, in which a pool of mutant synthetases was generated from the framework of the wild-type synthetase and then mutants were selected based on their specificity for an unnatural amino acid relative to the common 20.

Five active-site residues of Mj-TyrRS were randomly mutated to generate the synthetase library. After two rounds of selection, a synthetase was evolved that, when coexpressed with the mutRNATyrCUA, incorporates O-methyl-L-tyrosine into proteins in response to the amber codon with translational fidelity and yield rivaling those of natural amino acids (9). Thus, the genetic code of E. coli was expanded for the first time. I subsequently evolved a second mutant synthetase that is capable of selectively inserting L-3-(2-naphthyl)-alanine, an amino acid structurally distinct from tyrosine, suggesting that this methodology should be generalizable to various unnatural amino acids (10).

These results show that the genetic code can indeed be expanded further using nature's technique—loading amino acid to cognate tRNA via synthetase—although nature did not repeat this method for a 21st amino acid. This expansion may recapitulate how some of the common amino acids were added to the genetic code, suggesting that an incremental expansion was involved in the code's origin. The fact that no toxic side effects were observed in E. coli cells with UAG encoding an unnatural amino acid supports the codon reassignment hypothesis (2). To investigate the evolutionary consequences of adding novel amino acids to the genetic repertoire, a completely autonomous 21-amino acid bacterium was generated, which biosynthesizes p-amino-L-phenylalanine from basic carbon sources and incorporates it in response to the UAG codon (11). Directed evolution of such organisms under selective pressure is under way and may shed light on whether additional amino acids give an evolutionary advantage.

Genetically encoding new amino acids makes it possible to tailor changes in proteins in live cells, and therefore protein structure and function can be studied directly in vivo in addition to in vitro. Using the same method and system, we subsequently encoded more than 13 unnatural amino acids with novel functionalities in E. coli (12). For instance, the versatile keto group was genetically encoded in the form of p-acetyl-L-phenylalanine (13). It served as a unique chemical handle, through which proteins were selectively labeled with fluorophores for imaging, with biotin for detection, and with carbohydrates for generation of homogeneous glycoprotein mimetics (14). Other agents such as spin labels, metal chelators, cross-linking agents, polyethers, fatty acids, and toxins can be attached similarly. Two heavy atom-containing amino acids (p-bromo and p-iodo-L-phenylalanine) were site-specifically incorporated into proteins, providing a reliable method for preparing isomorphous heavy-atom derivatives of proteins for crystallography (12).

The availability of novel building blocks may lead to protein properties that never existed before. In an initial test, Tyr66 of the green fluorescent protein (GFP) was substituted with several tyrosine analogs, resulting in mutant GFPs with emissions ranging from blue to cyan to green, as well as other new spectral properties (15). In vivo unnatural amino acid mutagenesis by rational design or directed protein evolution should greatly expand the scope and power of protein engineering.

In summary, my thesis research demonstrated that the genetic code can be expanded to include new amino acids. The methodology is generalizable to different amino acids as well as cell types (16). It provides a new means for evolutionary study of the genetic code, and powerful tools for molecular and cellular biologists to dissect protein and cellular function both in vitro and in vivo. With additional building blocks genetically encoded, proteins and even organisms with enhanced or novel properties may be evolved.